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ABSTRACT 


In  this  thesis  the  benefits  of  parallel  computing  using  a  workstation  cluster  are 
oiplored  for  two  typical  Naval  applications.  The  applications  are  examples  of  one  off¬ 
line  and  one  on-line  program.  The  off-line  program  is  a  Navy  program  curroitly  in  use 
by  the  Naval  Space  Command  in  its  saldlite  prediction  model.  The  on-line  program  is 
a  large  grain  data  flow  problem  with  critical  throughput  requirements  and  rq>resaits  a 
hypothetical  combat  weapons  system.  Data  and  function  decomposition  techniques  are 
used  in  both  applications.  Speedup  and  throughput  are  the  p^ormance  metrics  studied. 

The  software  employed  was  the  Parallel  Virtual  Machine  CPVM)  by  the  Oak  Ridge 
National  Laboratory.  PVM  enables  a  network  of  heterogeneous  workstations  to  aqppear 
as  a  paralld  multicomputer  to  the  user  programs.  PVM  runs  over  the  workstation 
op^ting  system  and  provides  the  user  with  a  set  of  library  calls  for  message  passing 
and  process  creation. 
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L  INTRODUCTION 


All  successful  organizations  dq)end  on  reliable  and  timely  data  management.  As 
an  organization  evolves,  its  data  system  requiremoits  also  increase.  The  United  States 
Navy  is  an  example  of  one  such  organization.  Its  data  processing  requirements  demand 
evermore  computing  speed  and  cay)acity.  An  economical  solution  to  this  need  is  to 
network  the  workstations  present  in  abundance  and  utilize  parallel  processing.  To  this 
end,  this  thesis  provides  performance  results  of  two  typical  applications  on  a 
workstation  cluster. 

With  the  introduction  of  small,  relatively  inexpensive  computers,  a  vast  amount  of 
computing  resources  are  often  left  idle  for  a  long  period  of  time.  A  ship  often  has  this 
characteristic.  A  ship's  complement  of  computers  is  usually  used  for  intermittent  word 
processing  or  single  dedicated  computational  tasks.  With  these  computers  networked 
together,  a  lot  of  unused  CPU  power  is  available.  In  order  to  tap  into  these  unused 
assets,  parallelization  software  tools  have  been  developed.  These  programs  operate  at 
the  user  level  like  an  extra  layer  of  operating  system  code. 

The  Navy's  computation  requirements  can  be  classified  as  off-line  and  on-line  data 
processing  programs.  An  off-line  program  does  not  require  continuous,  time-critical, 
processing.  It  executes  once  per  some  specified  time  period  with  clear  beginning  and 
rading  times.  An  on-line  program  does  require  continuous  computational  assets  for  its 
functions.  It  is  characterized  by  constant,  non-stop,  real  time  processing  . 

For  this  thesis,  one  example  of  each  type  of  program  was  parallelized  using  a 
software  tool.  The  tool  used  for  parallelization  was  the  Parallel  Virtual  Machine 
(PVM).  The  off-line  program  was  the  Naval  Space  Command's  PPT2  Analytical 
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Satdlite  Position  Propagation  Program.  The  on-line  program  is  a  hypothetical 
Shipboard  Combat  Wes^n  System. 

A.  pm:  PARALUJL  VIRTUAL  MACHINE 

PVM  is  a  software  library,  currently  being  refined,  developed  by  the  Oak  Ridge 
National  Laboratory  (ORNL).  It  is  a  software  system  that  enables  a  collection  of 
heterograeous  computers  to  be  used  as  a  coherent  and  flexible  concuiroit 
computatimial  system  [Ref.  1].  PVM  was  chosen  because  it  is  relativdy  easy  to  use,  is 
an  emerging  standard  for  software  of  its  kind,  and  its  price  is  definitely  reasonable.  It 
is  currently  available  free  of  charge  from  ORNL  and  installation  is  relatively  easy. 
PVM  version  3.2  was  used  for  this  thesis.  A  short  description  on  acquiring  and 
installing  PVM  2y>pears  in  Appendix  A. 

B.  THESIS  SCOPE  AND  CONTRIBUTION 

The  goal  of  this  thesis  was  to  exploit  the  benefits  of  parallel  computing,  as  cost 
effectivdy  as  possible,  using  a  software  tool.  The  two  supplications  process  large 
amounts  of  data  and  rqpresoit  contrasting  requirements  while  lending  themselves  to 
parallel  processing.  The  positive  aspect  of  parallelizing  these  procedures  is  the 
performance  improvement  over  their  serial  counterparts.  Parallelization  could  have 
beoi  accomplished  using  a  specific  parallel  multicomputm^.  These  systems  tend  to  be 
large  and  expensive,  and  tie-up  extoisive  human  and  fiscal  resources  for  a  limited 
number  of  uses.  PVM  provided  the  desired  cost  effectiveness.  While,  arguably,  PVM 
may  not  accomplish  the  tasks  as  fast  as,  say,  an  INTEL  iPSC/2  hypercube,  the  process 
execution  times  were  satisfactory  for  the  applications  tested.  Furthermore,  they  were 
accomplished  on  a  shared  network  without  noticeably  disturbing  other  system  users. 
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The  representative  use  of  a  loosely  shared  network  in  this  thesis  is  the  most 
noteworthy  aspect.  For  instance,  the  on-line  application  was  tested  as  though  it  was  in  a 
shipboard  environment.  It  p^ormed  as  desired  while  simulated  shipboard  tasks  such 
as,  suf^ly  data-base  upkeep,  report  and  correspondence  word  processing,  and 
computerized  engineering  parameter  measuring  were  being  carried  out. 

C.  THESIS  ORGANIZATION 

This  thesis  is  organized  as  follows.  Chapters  n  and  HI  cover  the  Naval  Space 
Command's  PPT2  program.  Chapter  n  specifically  describes  PPT2  itself,  the  modes  of 
parallelization  used,  and  the  variable  mode  which  was  finally  used.  Chapter  m  reports 
the  results  obtained  for  this  application  and  recommendations  for  possible  future 
improvements  to  the  model. 

Chapters  IV  and  V  deal  with  the  hypothetical  Combat  System  model.  Chapter  IV 
details  the  design  and  requirements  of  the  model  and  Chapter  V  contains  the  results. 

Chapter  VI  contains  overall  conclusions  and  areas  for  further  study. 
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n.  PARALLELIZATION  OF  PFT2 


Currently  the  Naval  Space  Command  tracks  over  6000  objects  orbiting  around  the 
earth.  With  more  and  more  countries  entering  space  exploitation,  and  as  the  United 
States  increases  its  emphasis  on  space  communication,  this  data  set  of  satellites  will 
forseeably  increase  dramatically  in  the  future.  These  increases  in  the  satellite  catalog 
will  increase  the  computational  demands  on  the  computer  tasked  with  orbit  prediction. 
If  the  NAVSPACECOM's  orbital  model's  accuracy  is  increased  or  multiple  calls  to  the 
orbit  prediction  algorithm  are  made  for  accuracy,  the  computational  demands  may  be 
too  much  of  a  burden  if  the  computer  was  a  serial  machine  [Ref.  2].  Given  these 
computational  loads,  and  the  time  dependracy  of  the  results,  parallel  processing  of  the 
catalog  is  a  logical  extension. 

A.  PPT2 

PPT2  is  the  NAVSPACECOM’s  program  which  implements  an  analytic  satellite 
motion  model  based  on  the  Brouwer-Lyddane  orbital  prediction  theory.  Reference  [2] 
goes  into  great  depth  describing  this  theory  and  how  PPT2  implemmts  the  theory  in 
FORTRAN.  For  this  thesis,  the  accuracy  of  the  PPT2  program,  or  the  theory  of  how  it 
works  was  not  relevant.  The  one  major  aspect  of  PPT2  considered  was  the  required 
size  of  each  satellite  data  record  which  is  84  elements.  No  other  internal  details  of 
PPT2  are  discussed  here. 

B.  PARALLEL  DECOMPOSITION  METHODS 

Given  a  program  and  its  associated  data  set,  there  are  two  primary  ways  to  process 
it  in  parallel.  The  program  can  be  separated  into  individual  sections  with  a  processor 
dedicated  to  compute  its  respective  part,  much  like  a  factory  assembly  line.  The  other 
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piimary  method  is  dividing  up  the  data  set  and  sending  parts  to  many  sqKuate 
processors  all  running  the  same  algorithm,  but  on  different  data.  Each  of  these  methods 
is  highly  dqpendent  on  the  program  description  and  the  size  of  the  data. 

Although  the  PPT2  algorithm  is  sufficiently  large  to  break  down  into  individual 
computational  nodes,  the  data  set  size  is  such  that  data  decomposition  is  more  effective. 
These  observations  are  validated  in  Reference  [2].  Control  decomposition  had  been 
previously  attempted  but  was  not  successful  [Ref.  3].  Based  on  these  results,  all  of  the 
parallelization  methods  used  were  various  ways  of  decomposing  the  satellite  catalogue 
and  distributing  it  to  multiple  nodes  executing  PPT2. 

C.  DECOMPOSITION  STRATEGIES 

The  basic  algorithm  for  all  of  the  decomposition  strategies  used  a  master/slave 
distribution  network.  For  all  the  programs,  there  was  one  supervisor  (master)  node 
which  decomposed  the  data  set  and  distributed  it  to  the  worker  (slave)  nodes.  Each 
worker  ran  on  a  separate  processor  and  sent  its  results  to  a  gathering  node  which 
printed  the  results  to  a  file  and  reported  to  the  supervisor  whra  the  process  had 
completed  for  all  satellites.  Figure  2.1  graphically  presoits  these  relationships. 

To  get  a  general  understanding  of  the  decomposition  requirements  multiple 
decomposition  strategies  were  developed,  each  with  benefits  over  the  previous  strategy 
until  four  different  methods  had  been  explored.  All  the  methods  endeavored  to  keep  the 
worker  processors  busy  as  much  as  possible  to  increase  speedup  and  efficiency.  Each 
method  is  described  below. 
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figure  2.1.  Supervisor/Worker  Dependency  Graph. 
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1.  dsl:  Send/Request  One  at  a  Thne 

For  this  strategy,  the  supervisor  node  initially  sends  one  satellite  to  each 
individual  worker  node  and  waits  for  the  work^  to  individually  request  another 
satellite.  This  method  brought  out  the  high  PVM  communications  overhead  which 
needed  to  be  ovncome  for  adequate  speedup. 

2.  ds2:  Send/No  Request 

The  supervisor  node  for  this  routine  sent  one  satellite  at  a  time  to  each 
worker  node  until  the  input  file  was  exhausted.  This  process  reduced  the 
communications  overhead  between  the  supervisor  and  worker,  but  it  did  not  keq>  all 
the  processors  busy  for  a  sufficiratly  long  time. 

3.  ds3:  Send  Block 

For  this  scheme,  the  supervisor  divided  the  number,  S,  of  input  satellites  by 
the  number,  n,  of  worker  processors.  The  supervisor  thoi  decomposed  the  input  data 
into  blocks  of  S/n  size  and  distributed  these  to  each  processor  individually.  This  was 
much  more  efficient  than  the  previous  two  methods,  but  for  a  large  n,  n  >  8,  the 
workers  numbered  eight  and  above  were  still  not  getting  data  fast  enough  to  notice 
effective  processor  computational  overlap. 

4.  ds4:  Send  Half  Block 

For  this  scheme,  the  supervisor  divided  the  S/n  size  block  by  two  then  srat 
the  two  half  blocks  to  each  worker  so  all  the  workers  had  one  half  of  their  data  while 
the  supervisor  was  sending  the  second  half.  These  schemes  were  used  with  data  sets  of 
600  and  1200  satellites.  For  experimentation,  PVM  was  started  on  eighteen  different 
workstations  so  measurements  could  be  talrnn  for  one  to  sixteen  working  nodes. 
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The  four  decompostition  strategies  applied  to  PPT2 
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Figure  2.2.  PVM  Applied  to  PPT2  Using  600  Input  Satellites. 
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Number  of  Processors 


The  four  decomposition  strategies  applied  to  PPT2 
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Figure  2.3.  PVM  Applied  to  PPT2  Using  1200  Input  Satellites 
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The  collected  data  consisted  of  the  actual  execution  time  tak^  to  process  all  the 
elements  in  the  input  iiles.  The  programs  were  nm  tra  times  for  each  number  of 
processors  in  order  to  get  a  good  average  time.  They  were  executed  at  times  when  the 
network  was  minimally  used  to  avoid,  as  much  as  possible,  bus  contentions  with  other 
users.  The  results  of  these  are  given  in  Figures  2.2  and  2.3.  These  figures  show  a 
definite  advantage  in  sending  two  input  blocks  of  data  to  each  worker  node  ov»  the 
other  schemes. 

Some  other  decomposition  strategies  were  experimented  with,  but  not  in  as  much 
detail.  One  strategy  was  to  send  the  entire  input  data  to  all  of  the  worl^rs 
simultaneously  and  let  the  worker  nodes  extract  the  data  they  were  to  use.  This  method 
was  memory  prohibitive  and  its  execution  time  was  about  the  same  as  ds2  from  Figures 
2.2  and  2.3.  Other  data  distribution  techniques  involved  various  methods  of  packing 
and  unpacking  the  data  to  be  sort  via  PVM.  Only  the  data  block  decomposition 
schemes  could  take  advantage  of  these  attempts,  but  the  execution  time  improvemrats 
were  slight. 

D.  MULTIPLE  BLOCK  DECOMPOSITION  SCHEME:  DS5 

The  data  decomposition  scheme  ds4  was  modified  to  send  a  variety  of  block  sizes 
dq)ending  on  the  size  of  the  input  and  the  number  of  working  nodes  used.  In  this 
scheme,  ds5,  the  supervisor  still  sent  a  block  of  data  to  each  worker,  then  the  worker 
extracted  one  satellite  at  a  time  from  its  input  buffer  and  sent  a  block  of  results,  equal 
in  size  to  its  input  block,  to  the  gathering  node.  The  FORTRAN  code  for  the  dsS 
supervisor  and  worker/gathering  nodes  is  in  Appendix  B.  In  PVM,  the  buffer 
manipulation  time  is  the  costliest  aspect  of  communications  which  is  why  this  scheme 
optimized  the  performance.  Sending  blocks  of  data  between  processors  vice  one  data 
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dement  at  a  time,  minimized  the  buffer  manipulation  which  resulted  in  lower  execution 
times  for  this  data  distribution  scheme. 

The  next  chapter  provides  the  results  of  using  this  scheme.  Theoretical  «ecution 
time  equations  were  devdoped  for  this  scheme  and  compared  to  the  actual  results. 
The  optimal  number  of  processors  and  number  of  input  blocks  to  use  were  also 
calculated  along  with  values  for  speedup. 
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m.  RESULTS  OF  PPT2  WITH  PVM 


The  results  presented  in  this  chapter  were  obtained  using  the  data  block 
decomposition  strategy,  dsS,  discussed  in  Chapter  n.  Eight  working  nodes  were  used 
for  all  dsS  program  runs  and  were  used  to  obtain  the  data  for  all  the  figures  in  this 
chapter.  The  dsS  supervisor  and  worl^  programs  were  run  under  PVM,  on  the  Naval 
Postgraduate  School's  ECE  local  area  network  of  various  SUN/SPARC  workstations. 
The  ECE  LAN  is  an  Ethernet  based  network  of  various  types  of  workstations.  In  order 
to  maintain  data  integrity,  only  SPARC  IPX  and  SPARC  II  machines  were  used.  These 
machines  have  40  MHz  processors  and  have  been  configured  with  32  Mbytes  of  system 
memory  and  are  essentially  the  same  systems. 

A.  INITIAL  WORKER  EXECUTION  TIME  EQUATION  DERIVATIONS 

To  determine  the  length  of  time  required  to  run  the  parallel  algorithm,  ds5,  the 
execution  time  of  each  working  node  needed  to  be  determined.  This  execution  time  was 
broken  down  into  three  phases:  setup,  calculation,  and  breakdown.  During  the  setup 
phase  the  worker  node  waited  for  and  received  the  next  input  block  from  the 
supervisor.  The  calculation  phase  is  the  time  it  took  for  PPT2  to  execute  on  the  ratire 
input  block  of  data.  The  breakdown  phase  was  simply  the  period  in  which  the  worker 
node  packed  and  sent  the  results  to  the  gathering  node. 

In  order  to  obtain  an  expression  for  the  three  phase  times,  certain  variables  need  to 
be  introduced  to  represent  applicable  parts  of  the  program  process.  Table  3. 1  contains  a 
list  of  the  basic  variables  used  and  their  definitions.  Using  the  variables  in  Table  3.1, 
expressions  for  the  setup  time,  t,,  the  calculation  time,  t^,  and  the  breakdown  time,  t|„ 
were  derived  for  the  i*  worker  processor,  Pj. 
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TABLE  3.1.  BASIC  VARIABLE  LIST. 


Van;  ble 

Definition 

S 

total  number  of  satellites  in  the  input  file 

to 

node  process  initialization  time 

tgm 

time  for  gathering  node  to  rqx>rt  to  the  supervisor  the 
process  is  complete 

"b 

number  of  blocks  sent  to  each  worker 

Cf 

fixed  communications  time  for  buffer  setup  and  network 
access  for  sending  records 

Cp. 

communications  time  required  to  pack  and  send  one 
satellite  record 

Cupf 

fixed  communications  time  to  unpack  the  input  buffer 

Cupp. 

communications  time  to  unpack  one  satellite  record 

k 

number  of  working  processors  used 

Sp 

numbor  of  satellites  soit  to  each  worker  =  S/k 

Sb 

number  of  satellites  po'  data  block  =  Spni^, 

Tppc 

time  for  PPT2  to  operate  on  one  satellite  record 

1.  Setup  Fbase  Timing  Analysis 

The  time  it  takes  for  the  ith  node  to  setup  is  basically  dependent  on  the  time  it 
takes  for  the  master  to  send  the  data  blocks  and  the  time  required  to  unpack  the  input 
buff(^.  Initially,  the  working  node  on  processor  will  have  to  wait  for  the  master  to 
said  data  blocks  to  all  the  workers  j,  where  j  <  i,  before  the  first  block  is  sent  to  Pj. 
The  time  required  to  send  this  first  block  of  data,  t^i,,  and  the  time  to  unpack  each  block 
make  up  the  setup  time. 

The  time  required  to  send  the  first  block  is  rq>resatted  by  Equation  3.1: 

(3*1) 

where  t^i,  is  the  time  to  send  one  block  of  data  which  is  the  fixed  net  communications 
time  added  to  the  product  of  the  communications  time  per  satellite  and  the  number  of 
satellites  per  block  is  as  stated  in  Equation  3.2. 
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Iu={c,  +  c^s,)  (3.2) 

The  time  to  unpack  the  buffer,  ty,  is  the  time  spent  by  P,  to  unpack  all  of  the  blocks  of 


data.  This  time  is  expressed  in  Equation  3.3. 

(3*3) 

The  total  setup  time  can  now  be  expressed  as: 

f,=r^+r.  (3.4) 

which  is  simply  the  sum  of  the  first  block  communications  time  and  the  unpacking  time 
for  all  of  the  blocks  of  data. 

2.  Calculation  Phase  Timing  Analysis 

The  calculation  time  is  the  time  it  takes  for  the  PPT2  algorithm  to  process  one 
block  of  satellite  records.  Since  t^  is  a  function  of  the  block  size,  the  equation  for  the 
calculation  time  is: 

(3.5) 

3.  Breakdown  Riase  Tuning  Analysis 

The  breakdown  phase  is  the  time  required  for  the  working  node  to  send  one 
block  of  results  to  the  gathering  node.  The  expression  for  is: 

».=(c,  +  C^S.)  =  (.,  0.6) 

Using  the  equations  for  the  three  phases  and  empirical  values  for  the  variables,  which 
will  be  discussed  later,  the  worker's  total  execution  time  was  determined.  The 
execution  times  of  eight  worker  nodes,  given  four  input  blocks  of  data,  are  shown  in 
Figure  3.1.  The  processor's  phase  times  are  described  by  two  lines. 
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Setup/Execution/Breakdown  overlap  for  8  workers,  and  4  data  blocks 


CM 


jsquun(sj  jossooojd 

Hgure  3.1.  ds5  Worker  Execution  Hmes  Using  Eight  Worker  Processors. 
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Execution  Time  (secs) 


Tlie  setup  times  are  the  lines  on  the  processor  number  axis,  and  the  execution  and 
breakdown  times  are  on  the  line  one  half  space  below  the  processor  number.  The  blank 
space  betwem  the  worker's  breakdown  phase  and  the  next  setup  time  is  idle  time.  This 
idle  time  is  clearly  the  result  of  the  communications  time  required  by  the  master  to  send 
blocks  to  all  the  working  nodes,  taking  longer  than  the  execution  time  of  PPT2  on  each 
processor.  Given  the  fact  PPT2  may  need  to  be  run  several  times  for  accuracy  or 
tracking  requirements,  the  calculation  time  needs  to  be  scaled  by  some  constant.  A,  to 
take  into  account  multiple  iterations.  The  variable  A  is  the  number  of  times  PPT2  is 
executed  on  each  block  of  data. 

B.  EXECUTION  TIME  EQUATIONS 

Looking  at  Figure  3.1  again,  it  is  clear  the  worker  execution  time  for  the  ith 
worker,  for  any  /,  i  =  l,...,k,  is  the  total  setup  dme  added  to  one  calculation  and 
breakdown  time.  This  is  true  unless  dte  calculation  time  dominates  over  the 
communications  time.  As  a  result,  instead  of  a  single  equation  for  the  total  worker 
execution  time,  P|  ^  equations  depoiding  on  the  value  of  A.  Thus, 

the  total  time  to  execute  a  worker  node  on  processor  P^,  or  P^  nmtune  is  determined  using 
Equations  3.7  or  3.9. 

The  bracketed  term  in  Equation  3.7  is  the  time  in-between  the  end  of  a 
brealdown  period  and  the  begiiming  of  the  next  calculation  phase.  This  time  is  simply 
the  time  requited  by  the  supervisor  to  distribute  a  data  block  to  all  k  workers  for  each 
block  except  the  first  block.  The  subtraction  of  the  unpack  time  within  the  brackets  is 
required  because  the  expression  for  the  setup  time  is  made  of  unpack  times  and 
Equation  3.7  only  relies  on  the  unpack  time  for  die  final  block. 
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(3.7) 


for 

and 

for 


^  ^  ~  0^1*  +  i^ygf  ■*■ 

A  ^ 

tc 

Pi_nMme  =  ^,  +  «*(^  *  +  ^) 

^  ^  “  1)^1*  +  (^i«f  ■*■ 

A  ^  .._ 

tc 


(3.8) 


(3.9) 


(3.10) 


The  two  expressions  for  A  are  taken  from  Figure  3.1.  Equation  3.7  simply  means  that 
if  the  total  calculation  time  and  breakdown  times  are  less  than  the  time  between  setup 
phases  then  the  communications  cost  is  dominate.  Conversely,  Equation  3.9  is  for  the 
case  when  the  number  of  iterations  of  PPT2  causes  the  calculation  phase  to  dominate. 

From  the  above  equations,  the  total  execution  time,  Tg,  of  the  paralld  algorithm 
is: 

TE=to  +  Pk_r^+t^  (3.11) 

It  should  be  noted  that  this  equation  uses  the  operation  time  of  the  k*^  worker.  The  Id** 
work^  is  used  because  it  is  at  the  end  of  the  data  distribution  chain  and  takes  longer  to 
complete  execution  relative  to  the  other  workers. 

C.  PARALLEL  AND  SERIAL  PROGRAM  COMPARISON 

The  comparison  of  the  parallel  program  vs.  serial  program  entailed  theoretical  and 
actual  results.  In  order  to  accomplish  the  theoretical  comparison,  values  for  the 
variables  in  Table  3.1  were  needed.  Appendix  C  contains  the  empirical  results  from 
studying  the  performance  of  PVM  on  the  ECE  SUN  network.  These  values  were  then 


17 


in  the  preceding  equations  for  empirical  evaluation  of  the  two  programs.  The  total 
execution  time  of  the  serial  program  was  taken  to  be  simply  multiplied  by  the 
total  number  of  satellites  in  the  input  file.  Again,  input  and  ouq)ut  times  were  assumed 
to  have  bera  roughly  equal  for  both  programs  so  they  were  left  out  of  the  evaluations. 

The  input  file  used  for  testing  consisted  of  the  same  satellite  records  used  in 
Reference  p].  This  data  file  consisted  of  tei  different  records  which  were  then 
duplicated  for  a  total  of  4800  input  records.  An  unclassified  copy  of  a  portion  of  the 
catalog  was  obtained  ftom  the  Naval  Space  Command  after  the  research  was 
completed,  and  was  not  used  for  program  development  or  testing. 

Figure  3.2  show  the  final  comparative  results.  The  theoretical  lines  refer  to  using 
Equation  3.11.  The  actual  lines  represent  data  obtained  from  running  the  serial  program 
and  ds5,  (utilizing  8  workers),  using  values  of  A  from  1  to  10  for  both  programs.  A 
block  size  of  four  was  also  used  for  the  parallel  algorithm.  Figure  3.2  shows  the 
parallel  program  performed  better  than  the  serial  program  as  the  number  of  calls  to 
PPT2  was  increased.  This  performance  improvement  was  predicted  from  the  theoretical 
plots  even  though  the  actual  serial  program  performed  better  than  expected  and  the 
actual  parallel  program  performed  slightly  worse  than  expected.  It  can  also  be  noted 
that  when  A  »  7,  Equation  3.11  switehes  from  using  Equation  3.7  for  the  worker 
processor  run  time  to  Equation  3.9.  The  most  dramatic  event  this  figure  displays  is  the 
fact  the  parallel  program  did  not  perform  as  well  as  the  serial  program  for  A  =  1. 
Since  one  of  the  assumptions  of  this  research  was  the  fact  PPT2  will  most  likely  be 
executed  a  multiple  of  times,  the  results  for  the  case  A  =  1  are  to  be  noted  but  should 
not  detract  from  the  benefits  of  parallelization. 
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Serial  vs  Parallel  PPT2  execution  time  for  4800  satelites 


(so9s)  suiijL  uoi;nosx3 

Figure  3.2.  Serial  vs.  Parallel  Results  Using  ds5  With  Eight  Workers. 
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Iteration  Multiplier  A 


Overall,  the  dsS  algorithm  using  PVM  was  able  to  process  the  satellite  catalt^ 
faster  than  the  serial  program.  These  results  were  observed  when  the  £C£  network  was 
being  heavily  used  and  also  when  the  network  had  little  activity  on  it.  Also,  the 
empirical  data  for  the  actual  program  times  in  Figure  3.2  is  merely  a  representative 
result  of  executing  the  programs  at  one  c»tain  time  of  the  day,  and  diffmoit  numbers 
were  obtained  at  different  times,  but  again,  the  relative  performance  results  were  the 
same. 


D.  SPEEDUP  COMPARISON 

One  standard  figure  of  merit  in  comparing  two  algorithms  is  speedup.  Speedup  in 
this  case,  is  the  ratio  of  the  serial  results  to  the  parallel  results.  The  same  data  from  the 
previous  section  was  used  to  determine  the  speedup  ratios  for  values  of  A  ranging  from 
one  to  ten.  The  speedup  results  are  shown  in  Figure  3.3.  Even  though  the  actual 
speedup  was  less  than  expected,  theze  was  a  deiinite  decrease  in  execution  time,  thus  an 
increase  in  speedup,  when  parallel  execution  was  used  instead  of  serial  execution. 

E.  OPTIMUM  NUMBER  OF  PROCESSORS  TO  USE 

The  execution  time  savings  have  been  demonstrated  in  the  previous  sections,  but 
one  other  question  of  interest  is  what  the  optimum  number  of  processors  to  use  would 
be.  The  optimum  number  of  processors  to  use  can  be  determined  by  setting  the 
derivative  of  Equation  3.11,  with  respect  to  the  variable  k,  equal  to  zero  then  solve  for 
k.  This  will  provide  the  optimum  number  of  worker  processors  to  use.  Thus,  by  adding 
one  processor  for  the  supervisor  node  and  one  processor  for  the  gathering  node,  the 
final  value  for  the  optimum  number  of  processors  is  found. 
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Serial  vs  Parallel  Speedup  Ratios 


aujij_  uoi;no9X3  i9(|djdcJ/8uj!1  uo|^no9X3  |DiJ9s 
figure  3.3.  Serial  vs.  Ftirallel  (dsS)  Speedup  Ratios  Using  Eight  Workers. 
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iteration  Multiplier  A 


Using  Equation  3.7  the  optimum  product  of  worker  processors  and  data  blocks  can 
be  determined.  When  Equation  3.9  vm  used,  only  the  number  of  worker  processors 
can  be  found.  The  results  for  the  two  equations  are  given  in  Equations  3.12  and  3.13 
respectively. 


k  = 

With  the  exception  of  the  number  of  blocks  in  Equation  3. 12,  both  of  these  equations 
are  identical.  Equation  3.12  is  much  more  flexible  since  the  number  of  diffomt 
processors  available  may  be  limited  while  the  number  of  blocks  is  not.  For  example, 
using  Equation  3.12,  the  empirical  values  in  Appendix  C,  and  setting  A  =  2,  the 
optimum  ibi|,  product  is  63.35.  If  there  are  twelve  total  processors  available,  then  by 
subtracting  one  processor  for  the  gathering  node  and  one  processor  for  the  supervisor 
node,  there  are  ten  processors  available  for  the  workers.  Solving  for  the  optimum 
number  of  blocks  to  send  yields  6.335  resulting  in  to  be  six  or  seven. 

F.  PPT2  AND  PVM  WITH  ACTUAL  DATA 

As  mentioned  earlier,  a  sample  of  the  satellite  catalog  was  obtained.  Though  it  was 
not  used  in  deiiermining  which  paralld  algorithm  to  use  or  in  ascertaining  the  values  in 
Appendix  C,  it  was  used  to  produce  plots  similar  to  Figures  3.2  and  3.3.  The  data  set 
contained  6795  satellite  records.  The  serial  vs.  parallel  comparison  plot  is  provided  in 
Figure  3.4,  and  the  speedup  comparison  is  shown  in  Figure  3.5.  Again,  the  parallel 
algorithm  ds5  was  used  with  eight  worker  processors. 
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Serial  vs  Parallel  PPT2  execution  time  for  actual  catalog  data 


Iteration  Multiplier  A 


Serial  vs  Parallel  Speedup  Ratios 
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Figure  3.5.  Serial  vs.  Parallel  (ds5)  Speedup  Ratios  Using  the  Catalog  Data. 
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Iteration  Multiplier  A 


G.  PPT2  CONCLUSIONS 

The  results  of  this  chapter  clearly  demonstrate  the  effectivaiess  in  reducing  the 
overall  execution  time  using  a  parallel  algorithm.  Also,  this  algorithm  was  nm  using  a 
parallelizatitm  software  tool,  PVM,  on  a  loosely  coupled  network  of  SUN  workstations 
instead  of  a  dedicated  parallel  multicomputer.  Interestingly,  the  results  for  the  actual 
catalog  data  were  closer  to  the  theoretical  estimates  than  the  data  used  in  the  previous 
sections.  This  validates  the  earlier  results  even  though  they  were  more  conservative 
than  the  catalog  results.  Overall,  using  PVM  and  the  multiblock  data  decomposition 
scheme  resulted  in  improved  PPT2  operation,  which  was  the  goal  of  this  study. 
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IV.  A  THROUGHPOT  CRITICAL  ON-LINE  APPUCATION 


The  second  Naval  applicati(Hi  studied  was  the  on-line  hypothetical  combat  weapcHi 
system.  The  future  combat  systems  will  be  large  grain  data  flow,  throughput-critical 
systems.  These  systems  will  be  required  to  process  electronic  signals  to  detect,  track, 
and  determine  a  fire  control  solution  for  increasingly  sophisticated  threats.  An  sample 
of  the  next  generation  of  Navy  signal  processors  is  the  AN/UYS-2  Digital  Signal 
Processing  System  (also  known  as  the  Enhanced  Modular  Signal  Processor,  EMSP), 
which  implements  data-flow  parallel  processing  to  achieve  high  throughput  rates  for 
this  type  of  environment  in  a  single  tightly  coupled  system  [Ref.  4].  The  hypothetical 
system  presented  here  demonstrates  the  possible  use  of  a  loosely  coupled  LAN  based 
cluster  of  processors  in  large-grain  data-flow  paralldization  as  against  a  tightly  coupled 
system  such  as  the  EMSP. 

A.  PROCESSING  IN  A  HYPOTHETICAL  COMBAT  SYSTEM 

The  hypothetical  combat  system  is  defined  by  the  process  node  graph  of  Figure 
4.1.  This  graph  was  designed  to  take  into  account  the  normal  computational 
requirements  of  the  combat  system.  The  two  left  most  branches,  the  paths  through 
nodes  P4D  and  P2B,  represent  the  surface  and  air  and  the  subsurface  lire  control 
solutions  stq>s.  The  right  most  branch  represents  the  surface  and  air  tracking  iterations. 
The  nodes  are  marked  with  the  processor  it  resides  on  and  its  personal  identification 
letter.  For  instance,  P4D  stands  for  processor  four,  program  D.  The  lines  connecting 
the  nodes  have  arrows  indicating  data  flow  paths.  The  numbers  attached  to  the  lines  are 
a  measurement  of  how  large  the  data  message  is  between  the  two  nodes. 
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The  cumulative  communication  and  node  execution  time  for  each  processor  is 
s^roximately  equal  simulating  a  load-balanced  graph.  Anoth^  constraint  givoi  the 
graph  is  certain  nodes  must  be  allocated  to  certain  processors  due  to  memory 
depoidencies.  It  is  also  assumed  that  all  the  nodes  execute  once  in  a  given  period  which 
will  be  defined  later.  Though  this  is  purely  a  hypothetical  situation,  it  adequately 
simulates  a  possible  on-line  system. 

B.  PROBUMS  WITH  IMPLEMENTATION  USING  PVM 

PVM  presrated  a  few  distinct  problems  for  the  on-line  application.  One  problem  is 
the  high  cost  of  buffer  initialization  associated  with  using  PVM.  Each 
PVM_INITSEND  command,  [Ref.  1],  initializes  a  buffer  in  which  to  pack  the  output 
data.  This  cost  is  fixed  and  is  independent  of  the  amount  of  data  to  be  sent.  With  many 
relatively  small  messages,  this  initialization  time  became  an  important  factor  in  process 
execution  time  due  to  its  additive  affects. 

Another  problem  occurred  during  program  testing  with  added  network  loading.  In 
Chapter  V  the  loading  will  be  discussed,  but  essentially  a  part  of  the  forced  network 
communications  caused  a  slave  program  to  srad  multiple,  large  messages  to  another. 
This  sometimes  caused  the  PVM  daemon  process  on  the  slave's  host  computer  to  die. 
This  occurrence  has  been  reported  before,  [Ref.  S],  but  was  not  investigated  because 
the  use  of  the  PVM^ADVISE  command,  [Ref.  1],  eliminated  this  problem. 

C.  BATCHING  OF  COMMUNICATION  COSTS 

In  PVM  like  systems,  inteiprocessor  communication  has  two  distinct  components, 
operating  system  (OS)  related,  and  network  related.  The  OS  related  part  consumes 
processor  cycles  available  to  the  application  by  making  OS  calls  and  affects  the 
throughput.  This  could  be  regarded  as  OS  contention  between  nodes  on  the  same 


28 


processor.  The  network  related  part  makes  "available"  processor  cycles  for  one  node, 
which  is  trying  to  transmit  or  receive  data,  unusable  because  other  nodes  have  control 
of  the  bus.  This  is  network  contention  and  leads  to  processor  blocking.  Given  the 
gr^h  of  Figure  4.1  with  its  multiple  nodes  on  multiple  processors  these  contentions 
can  be  numerous  and  affect  the  desired  throughput. 

One  way  of  greatly  reducing  the  number  of  these  contentions  is  by  batching  the 
communication  for  each  processor.  Batching  communication  means  what  the  name 
implies,  taking  all  the  input  and  output  requirements  for  a  processor  and  giving  these 
tasks  to  one  and  only  one  node  assigned  to  the  processor.  In  order  to  accomplish  this,  it 
was  assumed  the  nodes  on  a  given  processor  could  communicate  using  UNIX  shared 
memory  and  that  such  communication  was  very  cheap  compared  to  PVM 
communication.  This  process  added  an  extra  node  on  each  processor  which  is 
analogous  to  the  gathering  node  described  in  Chapter  n  for  the  PPT2  algorithms. 

The  gathering  node  accesses  the  shared  memory  to  gather  the  output  data  for 
transmission.  It  will  also  access  the  shared  memory  to  place  the  input  data  upon 
reception.  To  do  this,  the  shared  memory  is  used  in  such  a  fashion  that  either  the 
graph  nodes  can  access  their  respective  memory  locations  or  the  gathering  node  can 
access  the  entire  memory,  but  not  both. 

D.  THREE  TECHNIQUES 

The  nodes  were  studied  using  three  different  methods  of  process  execution.  Of 
course,  the  overall  graph  execution  was  carried  out  in  the  sequence  shown  in  Figure 
4.1,  but  the  sequence  in  which  the  nodes  on  each  processor  executed  was  manipulated. 
The  three  methods  used  a  master/slave  relationship.  The  master  program  took  care  of 
the  PVM  process  spawning  and  then  acted  as  either  node  PIA  or  Processor  1 
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dq)ending  cm  the  teduiique  chosen.  Hie  master  would  initiate  an  iteration  thoi  wait  to 
receive  certain  criteria  from  the  slaves  before  proceeding  on  to  the  next  iteration.  All 
the  programs  were  written  in  C  and  are  in  the  jqipendices  as  moitioned  below.  The 
three  techniques  are  described  as  follows. 

1.  Unscheduled  Node  Processing 

The  unscheduled  node  processing  method  let  each  node  begin  execution  upon 
receipt  of  data  and  communicate  upon  completion  of  execution.  No  attempt  was  made 
to  reduce  the  number  of  contentions  described  in  the  communication  batching  section. 
In  this  technique,  there  is  a  PVM_SEND  for  every  message.  The  results  of  this  scheme 
was  the  metric  by  which  the  following  "improvements"  were  judged.  The  code  for  this 
set  of  programs,  one  for  each  node,  is  in  Appendix  D. 

2.  Scheduled  Node  Processing 

This  method  uses  the  scheduling  method  described  in  the  last  section  of  this 
chapter.  In  essrace,  all  of  the  nodes  on  a  given  processor  were  restricted  to  a  certain 
order  in  which  they  can  execute  thereby  reducing  the  number  of  OS  contentions. 
Shared  memory  use  is  assumed  for  communication  between  nodes  on  a  processor.  The 
batching  of  the  communication  between  processors  and  the  scheduling  of  nodes  on  the 
processors  greatly  reduces  the  network  contentions.  In  this  scheme,  there  is  a 
PVM_SEND  for  every  pair  of  communicating  processors.  The  code  for  this  technique 
is  in  Appendix  £. 

3.  Scheduled  Node  Processing  Using  Hardware  Multicasts 

This  technique  uses  basically  the  same  approach  as  the  previous  method,  but 
all  communications  are  assumed  to  be  passed  between  the  nodes  via  hardware 
multicasts.  Thus,  all  the  communication  from  a  processor  to  all  the  other  processors  is 
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multicast  at  the  hardware  levd  by  the  senders  communication  node.  In  this  scheme, 
there  is  a  PVM^SEND  per  processor.  This  PVM_SEND  is  assumed  to  be  a  hardware 
multicast  to  a  group  (which  is  not  currently  implemented  in  PVM).  This  further 
reduces  the  network  contentions  since  fewer  PVM  message  calls  are  used.  Hardware 
multicasts  were  chosen  because  software  multicasts  greatly  decreased  the  throughput. 
Using  the  PVM_MCAST  command,  [Ref.  1],  was  multicasting  at  the  user  level,  but  at 
the  OS  level,  the  PVM  daemons  were  handling  the  multiple  sends  and  receives.  PVM 
routes  messages  either  through  the  daemons  or  TCP  direct.  Since  recent  TCP 
implementations  make  use  of  hardware  multicast  for  implementing  user  level 
multicasts,  the  use  of  hardware  multicasts  instead  of  the  software  commands  was 
assumed.  This  is  expected  to  be  true  of  future  PVM  implementations.  The  code  for  this 
method  is  in  Appendix  F. 

To  further  clarify  the  network  contention  reduction  between  the  three  algorithms, 
an  example  follows.  From  the  graph  in  Figure  4.1,  Processor  1  has  three  nodes.  For 
the  unscheduled  method.  Processor  1  has  to  ou^ut  a  total  of  five  times  per  period. 
Using  the  scheduled  technique,  this  number  reduces  to  three.  Then  by  using  the 
hardware  multicasts  this  number  reduces  to  one.  Of  course,  as  the  number  of  message 
pack  and  send  calls  is  reduced  the  message  size  increases.  This  grouping  of  multiple 
messages  reduces  the  number  of  times  PVM  has  to  initialize  an  output  buffer 
eliminating  this  component  of  the  communications  cost  overhead.  However,  in  the  last 
technique,  every  processor  must  unpack  a  larger  message,  reducing  the  gain  from  a 
hardware  multicast. 
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£.  NODE  SCHEDULING 

The  last  two  processing  techniques  mentioned  above  depend  on  the  nodes  having  a 
certain  constraint  on  them  as  to  when  they  can  execute  and  communicate.  To  reduce  the 
OS  contentions  each  node  is  "scheduled”  on  its  respective  processor  so  each  one  has  its 
tum  without  blocking  another  node  or  being  blocked  itself.  Once  the  nodes  are 
scheduled,  it  is  instructive  to  think  of  their  execution  taking  place  within  one  frame  of 
time  slots.  One  of  the  assumptions,  or  constraints,  applied  to  the  hypothetical  graph  is 
the  sum  of  the  node  execution  and  communication  costs  on  each  processor  is 
approximately  equal.  This  sum  is  the  period  in  which  one  frame  of  scheduled  time  slots 
can  be  executed.  To  reduce  the  number  of  network  contentions,  the  interprocessor 
communication  is  scheduled  within  each  frame.  Figure  4.2  shows  a  representative 
frame  of  time  slots  with  the  nodes  from  Figure  4.1  assigned  to  their  respective 
execution  positions. 
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Figure  4.2.  Frame  of  Time  Slots  Starting  at  Time  t|. 


Figure  4.2  shows  the  schedule  of  nodes  for  the  ith  frame.  The  node  indices 
indicate  which  frame  of  data  they  are  executing  on  in  the  current  frame.  For  instance, 
PIA,  the  root  node,  is  working  on  new  data  received  for  this  frame,  and  P5C,  the 
output  node,  is  working  on  data  the  graph  received  i-9  frames  ago.  The  schedule  of 
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nodes  is  one  of  many  possibilities,  but  once  die  schedule  was  chosen  the  indices  used 
were  unique. 

To  determine  ind&c  j  for  node  xj,  the  following  algorithm  was  applied. 

Letting: 

tx  =  the  time  node  X  executes  within  the  period, 
t^  =  the  time  the  parmt  of  X  executes  within  the  period 
tpxc  =  the  time  the  processor  x  resides  on  communicates  within  the  poiod. 
tpxpc  the  time  the  processor  the  parent  of  X  resides  on  communicates  within 
the  period. 

k  =  index  of  parent  of  X. 

If  X  is  the  graph  root  node,  then  j  =  i. 

If  X  and  the  parent  of  x  reside  on  the  same  processor: 

<  txpthenj  =  k  -  1. 

If  tjt  >  tjq,  then  j  =  k. 

If  X  and  the  parent  of  x  reside  on  different  processors: 

If  W  <  ^Pxpe  then: 

If  t  x  >  tpxc  ^  <  *P3qpc  then  j  =  k  - 1. 

If  t  X  >  tpxc  and  tjq,  >  ^»xpc  then  j  =  k  -  2. 

If  t  X  <  W  and  ^  <  tp,^  then  j  =  k  -  2. 

If  t  X  <  tpxc  and  tjjp  >  tp^  then  j  =  k  -  3. 

If  tpxc  >  tpxpc  then: 

If  t  X  >  tpxc  and  t^  <  tp^  then  j  =  k. 

If  t  X  >  tpxc  and  t^  >  tj^  then  j  =  k  - 1. 

If  t  X  <  tpxc  and  ^  ^  then  j  =  k  - 1. 

If  t  X  <  tpxc  and  t^  >  tp3^  then  j  =  k  -  2. 


If  a  node  relies  on  more  than  one  parent,  then  use  the  above  algorithm  for  all  the 
parents  then  use  the  smallest  calculated  indor  out  of  the  set  of  calculated  indices  for  the 
node  X. 
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The  schedule  represented  by  Figure  4.2  is  not  the  only  possible  node  scheduling 
scheme,  but  it  was  the  one  chosoi  for  this  study.  Trying  to  determine  an  optimum 
schedule  with  respect  to  gra^h  latency  was  not  pursued. 
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V.  RESULTS  FOR  THE  ON-LINE  AFPUCATION 


The  three  node  processing  techniques  described  in  the  previous  chapter  were 
implemoited  using  PVM  and  certain  param^ers  of  performance  were  measured.  The 
execution  and  communications  costs  were  measured,  as  was  done  for  PPT2,  and 
theoretical  values  were  obtained.  The  programs  were  run  during  a  time  whoi  the 
network  utilization  was  low  and  during  a  time  when  the  network  was  purposefully 
loaded  in  order  to  compare  and  contrast  the  results. 

A.  PARAMETERS  OF  INTEREST 

Throughput  was  the  primary  measuremoit  studied.  The  values  obtained  were 
normalized  with  respect  to  the  theoretical  costs  as  discussed  below.  In  addition  to 
throughput,  post  processing  of  the  data  was  used  to  determine  the  size  an  ouq>ut  buffer 
would  need  to  be  if  there  was  a  buffer  between  node  P5C  and  the  next  stage  of  the 
weapons  system.  The  buffer  was  accessed  at  the  average  period,  f.  The  standard 
deviation,  s,  of  the  period  was  determined  to  clarify  the  results  of  the  buffer 
processing.  For  further  statistical  analysis,  the  coefficient  of  variation,  V,  which  is 
defined  as  the  ratio  s/t ,  was  calculated.  The  scheduling  rqpresrated  by  Figure  4.2 
implies  a  graph  latency  of  ten  frames.  Though  this  is  a  valid  area  of  interest,  output 
latency  was  not  studied. 

The  theoretical  period  was  determined  by  using  the  communications  costs  from 
Figure  4. 1  and  the  execution  times  for  the  nodes  in  the  longest  path.  The  execution 
loop  times  and  the  message  packing  and  sending  times  were  measured  on  the  ECE  SUN 
system  like  the  variables  for  PPT2  were  determined.  These  numbers  were  used  in 
combination  with  the  variable  weighting  factor  used  whra  the  programs  were  run  to 
determine  the  theoretical  period. 
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B.  RESULTS  WrraOUT  NETWORK  LOADING 


The  programs  were  run  a  multiple  of  times  to  determine  what  patterns,  if  any, 
they  exhibited.  Table  S.l  shows  the  results  of  one  such  run.  For  this  run,  the 
theoretical  period  was  approximately  1.509  seconds.  Even  though  empirical  results  are 
presented  here,  all  the  values  were  dependent  on  the  load  variations  in  the  SUN 
network  due  to  the  other  system  users.  While  these  values  were  obtained  for  this  run, 
another  set  of  runs  at  a  different  time  could  possibly  yield  diffment  results.  With  this  in 
mind,  more  emphasis  was  placed  on  the  tr^ds  and  patterns  observed  than  on  the  actual 
values. 

TABLE  5.1.  TYPICAL  ON-LINE  RESULTS  WITHOUT  NETWORK 

LOADING. 


Units  =  seconds 

Unscheduled 

Scheduled 

Multicast 

Average 

Period,  t 

0.871 

0.996 

1.001 

Normalized 

average, 

57.7% 

66.0% 

66.3% 

Standard 
deviation,  s 

0.339 

0.0786 

0.0729 

Coefficient  of 
variation,  V 

38.9% 

7.89% 

7.28% 

Mean  ouq>ut 
buffer  size,  b 

3.56 

1.431 

1.896 

From  Table  5.1,  the  periods  were  slightly  higher  for  the  Scheduled  and  Multicast 
techniques  than  the  Unscheduled  method.  This  was  the  general  trend  for  all  the  runs. 
Another  trend  was  the  fact  the  standard  deviations  of  the  Unscheduled  method  was 
between  three  to  six  times  larger  than  the  two  scheduling  algorithms.  This  was  readily 
evident  in  the  output  which  showed  a  wide  range  of  throughput  values  for  the 


Unscheduled  technique,  and  a  nune  narrow  range  for  dw  other  two.  The  output  buffer 
pattern  observed  was  that  the  Unscheduled  buffer  size  would  be  from  two  to  five  times 
larger  than  the  other  two. 

The  buffer  sita  was  observed  to  be  more  oscillatory  for  the  Unscheduled 
processing  than  the  other  two  approaches.  Figures  5.1,  5.2,  and  5.3  show  the  buffer 
size  in  reference  to  the  iteration  number  for  a  run  of  1000  graph  iteration  cycles 
utilizing  the  three  node  processing  techniques.  These  plots  reinforce  the  buffer  data 
observations  by  showing  the  Unscheduled  buffer  size  varying  more,  and  getting  larga 
than  the  Scheduled  or  Multicast  methods. 

C.  RESULTS  WITH  A  NETWORK  PERTURBATION 

The  addition  of  a  controlled  load  was  applied  to  the  process  runs  for  this  section. 
The  loading  consisted  of  one  program  manipulating  large  amounts  of  input/output, 
around  3.5  Mbytes,  and  two  other  programs  sending  and  receiving  a  large  amount  of 
fairly  large  messages.  These  load  programs  were  assigned  to  the  same  processors  used 
by  the  graph  nodes.  The  results  from  one  of  the  runs  are  presented  in  Table  5.2. 

This  run  was  chosen  because  it  presented  some  of  the  uncontrollable  network 
influences  as  well  as  the  observed  trends.  One  example  of  the  network  usage  affects  is 
observed  Li  the  periods  for  the  run  prior  to  adding  the  load.  The  period  for  the 
Unscheduled  method  is  noticeably  less  than  the  other  two  methods  which  is  in  contrast 
with  the  data  in  Table  5.1.  This  is  due  to  the  network  load  variations  at  the  times  the 
programs  were  executed. 
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Figure  5.1.  Unscheduled  Output  Buffer  Size. 
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Figure  5-2.  Scheduled  Output  Buffer  Size. 
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figure  5-3.  Hardware  Multicast  Output  Buffer  Size. 
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table  5  TYPICAL  ON-LINE  RESULTS  WITH  NETWORK 

PERTURBATION. 


Units  ==  seamds 

Unscheduled 

Scheduled 

Multicast 

Before  loading 

0.784 

0.996 

1.072 

t 

in 

60.0% 

66.0% 

71.0% 

s 

0.256 

0.2489 

V 

32.7% 

8.12% 

23.2% 

b 

1.78 

1.587 

2.14 

1.094 

1.611 

1.566 

t 

in 

72.5% 

106.8% 

103.8% 

s 

0.277 

0.1914 

0.311 

V 

25.3% 

11.88% 

19.86% 

b 

3.21 

1.575 

2.84 

The  observed  patterns  for  each  section  of  Table  5.2,  before  loading  and  during 
loading,  were  similar  to  those  described  in  the  previous  section.  The  most  prominent 
observation  for  this  run  comes  from  comparing  the  two  sections.  The  buffer  size  stayed 
relatively  constant  before  loading  and  during  loading  for  the  Scheduled  and  Multicast 
techniques,  but  the  Unscheduled  buffer  size  would  increase  by  two  to  five  times.  The 
periods  also  increased,  but  not  as  significantly. 

D.  ON-LINE  CONCLUSIONS 

The  use  of  node  scheduling  did  not  adversely  affect  the  periods  compared  to  not 
scheduling  the  nodes.  While  memory  is  cheap,  and  the  buffer  size  may  not  be  a 
hardware  problem,  the  access  time  can  be  considerable  compared  to  the  throughput. 
This  could  add  an  excessive  delay  to  the  ov^all  throughput  of  the  graph  when  looking 
at  it  firom  the  next  stage  after  node  P5C.  This  was  the  stimulus  behind  the  buffer 
consideration. 
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The  hardware  multicasting  did  not  considerably  improve  the  performance  of  the 
node  scheduling.  Hus  may  be  caused  by  the  physical  properties  of  the  chosra  graph 
because  the  nodes  had  to  unpack  one  large  message  instead  of  a  few  smallo'  ones.  The 
smaller  messages  woe  received  at  various  times  allowing  the  nodes  to  unpack  them  as 
the  data  arrived  instead  of  all  at  one  time.  Another  factor  which  influenced  the 
Multicast  performance  was  the  foct  the  physical  hardware  was  not  available  and  PVM 
was  used  to  simulate  it. 
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VL  CONCLUSION 


This  thesis  provided  sqiarate  conclusions  at  the  end  of  each  application  section. 
Overall,  using  a  software  tool  can  effectively  improve  the  performance  of  a  given 
procedure.  PVM  is  becoming  the  standard  for  use  as  a  paraUdization  software  tool  and 
it  demonstrated  its  usefulness  when  appUed  to  the  off-line  PPT2  program  and  the  on¬ 
line  hypothetical  combat  weaptms  system. 

A.  FUTURE  STUDY 

Further  work  is  required  in  the  following  areas: 

1.  The  PPT2  theoretical  optimum  block  size  could  be  studied  further. 

2.  A  larger  set  of  data  from  the  Naval  Space  Command  would  increase  the 
usefulness  of  the  results  presented.  If  possible,  the  use  of  the  block  distribution 
algorithm  and  PVM  on  the  actual  satellite  catalog  with  the  proper  number  of  iterations 
for  each  individual  record  would  better  demonstrate  a  real  scenario. 

3.  The  four  data  decomposition  schemes  presented  for  PPT2  are  basic  with 
numerous  possible  improvements.  One  such  variation  is  having  the  supervisor  send  an 
initial  block  of  data  to  each  worker,  divide  up  the  remaining  records  into  blocks,  then 
send  these  blocks.  Another  area  for  testing  is  the  use  of  multiple  supervisors  with  their 
own  sets  of  worters  implementing  each  of  the  schemes. 

4.  The  way  in  which  the  network  was  loaded  for  the  on-line  application  was  not 
varied.  The  load  programs  ran  on  the  same  processors  the  node  programs  were  on. 
Further  study  on  the  affects  of  the  load  programs  operating  on  different  processors  is 
warranted. 

5.  The  on-line  application  research  just  scratched  the  sur&ce  of  the  possibilities  for 
this  area.  The  code  for  the  node  processing  schemes  were  written  with  the  user  able  to 
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easily  vary  the  communicaticms  and  execution  costs.  Though  many  program  runs  were 
accomplished  for  diis  thesis,  the  varying  of  the  costs  was  not  fiilly  studied. 
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APPENDIX  A  -  ACQUIRING  AND  INSTALLING  PVM 


First  send  the  message  "send  index  from  pvm3"  to  netlib(^ml.gov 
then  follow  the  instructions. 

Probad)ly  the  quickest  way  to  get  the  files  is  to  use  rep. 

^ger  anon@netlib2.cs.utk.edu 

(*  this  command  will  explain  how  to  copy  files  from  the  Netlib  Software  Repository  *) 

It  will  tell  you  to  use:  rq)  anon@netlib2.cs.utk.edu:FILENAM£  LOCAL_FILENAME 

Create  the  directory  "pvm"  where  pvm  is  to  be  installed. 

So  type  the  following  commands  from  the  pvm  directory: 

rep  anon@netlib2.cs.utk.edu:pvm3/filename . 
or 

rep  -r  anon@netlib2.cs.utk.edu:pvm3/directory  . 
for  all  the  files  listed  in  the  index. 

At  some  point  in  time,  the  access  modes  for  aU  files  should  be  changed  to  allow  all 
users  to  able  to  read  and  execute  them. 

Next,  type:  shpvmS.l.shar 

This  command  will  create  the  pvm3  subdirectory  and  extract  the  pvm  files, 
more  pvm/pvm3/lib/cshrc/stub 

This  command  shows  a  portion  of  code  that  needs  to  be  appended  to  the  installs 
.eshre  file.  The  "setenv  PVM_ROOT  "  line  must  be  modified  to  take  into  account  the 
current  location  of  pvm. 

pvm/pvm3/make  all 

This  command  will  then  compile  the  pvm  source  code.  Look  in  the  file  Makefile  for 
individual  options  if  you  do  not  want  to  install  everything.  If  errors  occur,  the 
Makefile.body  needs  to  be  modified  then  ../lib/UpdateMk  needs  to  be  run.  For 
instance,  in  the  file  Makefile  in  the  xep  subdirectory,  the  xcflags  path  had  to  be 
changed  in  order  to  get  xep  installed  (only  occurred  when  installing  3.1,  had  no 
problems  installing  3.2. 
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APPENDIX  B  -  BLOCK  DECOMPOSITION  SCE 


TMi 


PROGRAMS 


SUPERVISOR  (MASTER  )PROGRAM: 

pfognun  dMiislin 
include  '4ivni3.1i' 

c - - - 

c  Foftnm  Master  program  to  sdve  0ie  NAVASPASUR  satellite 
c  ofint  predicdoo  problem. 

c  This  program  reads  the  data  from  a  file  and  distributes  die  data  to 
c  the  woikmg  nodes  one  blodc  at  a  time, 
c - 


inqilicit  real*8  (a-4i,o-z) 
chaiactw^ld  fileom 

int^er  pid,  bsz 
integer  eof,  gattiine(60) 
int^M’  start,  finidi,  endtune,  Gettime 
external  Gettime  ISpragma  C(  gettime  ) 

commoii/bloc/8at(84,8000) 

data  istop/l/,pid/0/,msglen/672/ 
data  isat/l/,n/l/ 

integer  i,  info,  nproc,  iter 
integer  m)rtid,  tids(0:40),  8lvtime(16) 
integer  who 
char8cter*12  nodename 
character*8  arch 

'*  Enroll  this  program  in  PVM 
call  pvmfmytid(  mytid  ) 

c - Starting  up  all  the  tasks - 

*  Initiate  nproc  instances  of  thesisls  slave  program 
print  ''',‘How  many  working  slave  programs  (1-16)?' 
read  *,  nproc 

nproc =nproc + 1 
print*,'  ' 

print*,  'Whidi  input  file?' 
read*,  filenm 
print*,  'What  blocksize?' 
read*,  bsz 

print*,  'How  many  it^ations?' 
read*,  anum 

*  - ^Read  conqilete  catalog  of  satellite  data - 
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op«ii(10,file«B  fUeom) 
rMd(10,*,i<Miat*eo^sit(),  1)  J  « 1,84) 

isat— 0 

200  if(eof.ge.0)dieii 
uat»iaat+l 

fMd(10,*,kMtat«>eofXs«t(j.int+ 1)  J  » 1,84) 
go  to  200 
endif 

clOM(10) 

c  Print  Ae  number  of  satellite  records  received. 

print*,*isat  »  ',isat 
print*,' 

*  Ifarch  is  set  to then  ANY  configured  machine  is  acceptable 

*  otherwise  arch  should  be  set  to  architecture  type  you  wish  to  use. 
nodename  «  'thesisls* 

arch=  •*' 

call  pvmf^wn(  nodename,  0,  ardi,  nproc,  tids,  info ) 


c  -  - - **  Begin  user  program  **• 


c -  Get  beginning  time  stamp 

start  =  Gettime  (  start  ) 


*  send  number  of  satellites  and  slave  id  array  to  slaves 

msgtype  »  2 

call  pvmfinitsaid(0,  info) 

call  pvm^padcf  INTEGER4,  nproc,  1,  1,  info  ) 

call  pvmipackC  INTEGER4,  tids,  nproc,  1,  info  ) 

call  pvm^pack(  INTEGER4,  isat,  1, 1,  info  ) 

call  pvmii>ack(  INTECER4,  bsz,  1, 1,  info  ) 

call  pvmipack(  INTEGER4,  anum,  1, 1,  info  ) 

call  pvmftncast(  nproc,  tids,  msgtype,  info) 


*  broadcast  data  to  all  node  programs 
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c  DclaniiiieliowiiaBy  nconbineadiblodc 
iter«iMt/((ivroo-l)*lMz) 
itatx»iiio^i«it/b«z,Bproc-l) 

<io30Sj-l.ba 

do  300  i»l,i4)roc-l 

ii(i  .le.  iteix)  tlieo 
item  »  iter  +  1 
else 

item  *■  iter 
eodif 

call  pviiifimtsend(  0,  info ) 

call  pvm^MdcC  BYTEl.  msgleii^item,  1,  info ) 

call  pvmfoeodC  tids(i),  3,  info ) 

300  n«n+item 

30S  cmitinue 

c  wait  for  data  con^letion  signal  from  gathering  node 
call  pvmfrecv(  tids(0),  4,  info  ) 
call  pvmfuDpack(  INTEGER4,  gattime,  33, 1,  info  ) 


c -  Collect  ending  time  stan^ - 

finish  Gettime  (  finish ) 
codtime  =  finish  -  start 

print*,  'The  end  to  end  runfime  is  ',eodtime,  *  usees.' 


c - End  user  program - 

c  program  finished  leave  PVM  before  exiting 
call  pvmfisxit(info) 
stop 
end 
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WORKER  (SLAVE)  PROGRAM: 


prognun  diMislt 
include  '^pvin3.h* 


c  - -  - .  ■  . . 

c  Foftnn  Sieve  progiam  to  solve  die  NAVASPASUR  satellite 
c  ofbit  pfedktkn  proUem.  The  slave  |»ognua  consists  of  two  nodes, 
c  One  node  is  a  position  cakulatian  node  snd  the  other  is  the  dsta 
c  colectkn  node.  This  verskn  of  the  master-slave  configuration 
c  lecmves  bkdcB  of  data  bom  the  master  snd  performs  calculadoos. 

c - 


iinplicit  ieal*8  (a-h,o-z) 
real*8kf(10) 

integer  pid,  me,  iq»oc,  bsz,  gattime(60) 
integer  start,  finish,  endtiine,  Gettime 
external  Gettime  !$pragma  C(  gettime  ) 

oomnion/coii8/a(64) 

cominon/ppt/f(25),oec(10),kf(10),cfl[10),b8(3,4),u(3).v(3),w(3),r, 
&  vel(3),diiid,tm,dkz,dident 

commoo/dcsub/pe(6,8),e(8,8),qK8,8),g(8).gp(8),ifti(8),ifto(8), 

&  iteii,itetojofJol,8tat(20),tol(6),iw,oiiCll),ow(8,8) 
commoo/foieo/Tfao(3),t08,hdr,hdv,tdv,deI,iter 
conimon/btoc/sat(84,8000) 

data  istop/l/,pid/0/,msgleti/672/ 
data  isat/i/,n/l/ 


int^er  info,  mytid  .-ntid,  msgtype 
intern'  tids(0:40) 

c —  Emoll  this  program  in  PVM - 

call  pvmfinytid(  mytid  ) 

c  Get  the  master's  task  id 
call  pvmQparent(  mtid  ) 


c - ♦♦  Begin  user  program  ** - 

c  Recdve  data  from  host 
c«Il  pvmfiecv(  mtid,  2,  info  ) 
call  pvmfunpack(  INTEGER4,  nproc,  1, 1,  info  ) 
call  pvmfuiqMck(  INTEGER4,  tids,  nproc,  1,  info ) 
call  pvmfun^k(  INTEGER4,  isat,  1,  1,  info  ) 
call  pvmfimpadt(  INTEGER4,  bsz,  1,  1,  info  ) 
call  pvmfuapack(  INTEGER4,  anum,  1, 1,  info  ) 
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c  Detanniiie  which  slave  I  am 
^  ck>  10  i»0,  qMOC-1 

if(  tids(i)  .eq.  mytid  )  me  »  i 
10  continue 


c  Detenniiie  if  I  am  die  gathering  slave 
if(  tids(0)  .eq.  nqrtid  )  dien 

c  Execute  the  gathering  node 

*  Begin  Collecting  Node 

msgtype  »  10 
k-1 

do  1000,  i»l,(npfoc>l)*bez 
call  pvmfiecv(  -1,  msglype,  info) 
call  pvmfunpeck(INTEGER4,  iter,  1,  1,  info) 
call  pvinfoiqMck(BYTEl,  sat(l,k),  msglen^iter,  1,  info) 
k«(i)*itef+l 
1000  continue 

c  Commented  out  since  I/O  time  was  not  coosidmed 

*  Write  results  to  extmnal  file 

*  <:qieo(6,file=*/home3/st<»e/pvm3/bin/SUN4/diesisl.out') 
dol231i»:l,isat 

*1231  write(6.*)(sat(i,i)o=l,84) 

*  cIo6e(6) 


*  Send  message  to  Host  that  process  is  conq>lete 

call  pvmfinitsend(  0,  info) 

call  pvm4iack(  INTEGER4,  gattime,  33, 1,  info  ) 

call  pvmf^d(  mtid,  msgtype,  info ) 

*  End  Collecting  Node 


c - ^Begin  Woiidng  Nodes- 
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else 

"  Detannine  oqr  block  SUB 
itier«isat/((iq>roc-l)*bsz) 
itefx»iiio^i8at/bsz,iiproc-l) 


, ,  Iwginning  rimft 

Start  »  Getdioe  (  start  ) 

caU  coosl 
msgtype  *■  3 

do  IdOSj-l.bsz 

if(me  .le.  itan)lbea 
iteni**itef  + 1 
else 

item<Biter 

eodif 

k=(j-l)*iter+l 


call  pvmfncv(  mtid,  msgtype,  info  ) 

«^n  pvinfttnpack(  BYTEl,  sat(l,k),  msglai*item,  1>  info ) 

do  1400  i»:l,itan 

c - RecMve  satellite  to  process  — 


do  1380  n»  1,84 
1380  l(n)-sat(n,k+i-l) 

"■  Set  parameters  for  subroutine  ppt2 

ind=sl 

kz>Bidint(dkz) 

Conqmte  secular  recovery 
call  ppt2(ind,kz) 

*  Compute  subsequent  task,  ie.  predict  position,  update  elements 

ind-:idmt(dind) 
call  ppt2(ind,kz) 
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dol390n-l,S4 
1390  aat(aJffi<l)-f(B) 


1400  continue 


c - Send  oonifNited  results  to  gsthering  node - 

csU  pvinfinit8end(  0.  info  ) 

call  pvm^MckC  INTEGER4,  item,  1, 1,  info) 

call  pivinfjpack(  BYTEl,  8at(l  Jc),  msgle&*item,  1,  info  ) 

call  pvni6eod(  tids(0),  10,  info  ) 

1405  continue 

c -  Ending  tiine  staop - 

finish  —  Gettime  (  finish  ) 
eadtime  »  finish  -  start 

c  The  following  was  used  for  trouble  shooting  and  processor  conqMuison 
'*  call  pvinfiMlvise(PvmRoutdDirect,info) 

*  msgtype  -  25 

*  call  pvmfinitsendC  0,  info  ) 

*  call  pvmijpackC  INTEGER4,  endtime,  1, 1,  info  ) 

*  call  pvniftend(  mtid,  msgtype,  info) 

end  if 

c - End  user  program - 

c  Program  finished.  Leave  PVM  before  exiting 
call  pvmfexit(info) 
stop 
end 


♦DECKPPT2 


NAVAL  SPACE  COMMAND  PROPRIETARY  CODE  SEE 
PROFESSOR  B.  NETA,  NAVAL  POSTGRADUATE 
SCHOOL  FOR  ACCESS  TO  THE  PPT2  SOURCE  CODE 
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APPENDEK  C  -  EMPIRICAL  VALUES  FOR  PPT2  VARIABLES 


The  ECE  network  results  presented  here  were  using  SUN/SPARC  IPX  and 
SUN/SPARC  n  stations.  Esentially  no  difference  in  output  values  were  observed 
between  the  two  stations. 


f  I .  L-  I 


APPENDIX  D  -  ITNSCHEDULED  NODE  PROCESSING  PROGRAMS 


MASTER  PROGRAM  (NODE  PIA): 

UaadMdiiled  muter  program,  also  handles  node  PIA  oomnwmicarion  and  executtoo 
reqniremspts. 

#include  *pvm3.h” 
finclnde  <stdio.h> 
finchide  <sys/tiinB.h> 

#indode  <tinie.h> 

^include  <s3rs/type6.h> 
iNnchide  <8ifDal.h> 
jKnchide  <8tdlib.h> 


/*  CONSTANTS  •/ 
#d^nedone  Ip  1000 
fdefinesnl  ~  400 

#defuie  dosize  300 


/*  Loop  iteradcn  counter  */ 

t*  Iteradoo  numbu  for  start  of  network  loading  *! 

/*  Size  of  noise  message  */ 


/*  GLOBAL  variables  ♦/ 
int  done  ~  0; 
int  who; 
intpnum; 

doifole  data_mat[S000]; 
double  do^again; 
int  ld_num  55000; 


int 

int 

int 

int 

int 

int 

int 

int 

int 

int 

irU 

int 

int 

int 

int 

int 

int 

int 

int 

int 


/*  SLAVE  VARIABLES  */ 


comm _gain ; 
my_wt; 
in4dfla  300 
in2bfla  »  300 
in5afla  =  300 
iii2af4d»400 
iii2df2b  »  400 
in4a£5a  »  900 
in4cf2a  =  350 
in3ai2a  »  350 
in3bf2d  350 
in4bf2d  ==  3^ 
inlbfSa  »  525 
in5bf3b  »  525 
in3cf4c  =  350 
in3cflb  =  300 
in2c£n)  =  300 
in2cf4b  =  350 
inlcf4a  =  350 
in5cflc  as  300; 


I*  For  varying  the  communication  weights  */ 

/*  For  varying  the  execution  wei^ts  */ 

/*  The  next  variables  contain  die  Inanch  conununicaticm  *! 
/*  and  are  defined  u,  using  in4dfla,  input  to  node  *1 
I*  P4D  from  PIA  */ 
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int  iiiScf2c  <■  300; 
int  in5c0c  ■>  300; 

munO 

{ 

c]iarSLAVENAME(3]; 
inti,  k; 

intnuim,  ngnimlot; 

int  n_tm_ittui  *  0,  wnl  0; 

int  nl_done  <■  0; 

dur  nqnianie(S]; 

int  nfwoc  »  16; 

int  nqrtid;  I*  my  tadc  id  */ 

int  tid8{20],  sddsfS];  /*  clave  tack  idc  *! 

intnqiMOC  »  3; 
ctnict  itimerval  tmrval; 

/*  read  in  communication  and  executko  acale  factors  */ 
printfCNnComm  wt  <■  *); 

8canf(”%d*,  Acomm .jain); 
l»intf(*\Ex  wt  *); 
scanii("%d',  Amy_wt); 
my_wt  =  my_wt*4; 

I*  use  loading  or  not  */ 

printf(*\Widi  NET  loading  type  1,  without  NET  loading  Qrpe  2:  *); 
scaafC^d",  Awnl); 


/*  initialize  matices  */ 
for(k-0;k<1500;k++) 
data_mat[k] «  (double)k +5.66666; 

/*  enrdl  in  pvm  */ 
mytid  pvm_mytid0; 

/*  start  iq)  slave  tadcs  */ 
nproc  =  16; 

getho6tname(myname,5); 

pvm_qMwn(''plb”,  NULL,  1,  myname,  1,  &tids[0]); 
pvm_qMwn(*plc”,  NULL,  1,  myname,  1,  &tids[l]); 
pvm_spBwn(”p2a”,  NULL,  1,  ”sun3*,  1,  &tids(2]); 
pvm_tqMwn(‘'p2b”,  NULL,  1,  ''sun3'',  1,  &tid^3B; 
pvm_qMiwn(''p2c'',  NULL,  1,  ''8un3'',  1,  &tid8[4]); 
pvmjspawn(”p2d'',  NULL,  1,  'sunS*,  1,  &tids[S]); 
pvm_8pawn(*p3a*,  NULL,  1,  *sun8”,  1,  &tids[6]); 
pvm_^>awn("p3b",  NULL,  1,  "sun8*,  1,  &tids[7]); 
pvm_^wn(”p3c”,  NULL,  1,  ”sun8*,  1,  &tids[81); 
pvm_^wn(''p48'',  NULL,  1,  *sun9*,  1,  &tids[9]); 
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pvm_tpMvii(*p4b\  NULL,  1,  'auB9'.  1.  Atid^lO]): 
pviB_ip«Mi(*^’,  NULL,  1,  ''wa9*,  1,  &tids[ll]}; 
pvm  ■pc«vii(*p4d‘',  NULL,  1,  *siiii9'',  1,  &tys[12]); 
pvniI«|Mwn<*p5a-.  NULL,  1,  *wii20*.  1.  Jttidi[13]); 
pvm_qMiwB(”pSb*,  NULL,  1,  *suii20'',  1.  &tidi(14]): 
pvni_ap«wii(*p5c”,  NULL,  1,  *8un20*,  1,  &tuis{15]); 

/*  Send  initial  bode  keeping  data  to  all  the  slaves  */ 
pvni_initsead(PvmDataDefiNilt): 
pvmjpIdntC&npioc,  1, 1); 
pvm_pkint(tida,  nproc,  1); 
pvmjpkint(Amy_wt,  1,  1); 
pvni_pkint(dte<Mnni ,jain,  1, 1); 
pvni_tncaat(tids,  iqnoc,  20); 

/*  Send  die  input  and  ou^wt  coats  for  each  slave  */ 
pvni_initaend(PvniDataDe£nilt); 
pvnijpkint(&folbf3a,  1, 1); 
pvm_pkint(&in3cflb,  1, 1); 
pvni_8end(tids[0],  25); 

pvm_initsend(PvniOataI>efiuilt); 
pvm_pkint(&uilcf4a,  1,  1); 
pvinjpkint(&inScflc,  1,  1); 
pvm_seod(tids[l],  25); 

pvni_init8end(PvniDataDe£sult); 
pvin_|ddnt(&fo2af4d,  1,  1); 
pvni_pkint(&in4cf2a,  1,  1); 
pvm_seod(tids[2],  25); 

pvm_initsqid(PvmDataDefiiult); 
pvmjpkint(&in2bfla,  1,  1); 
pvm_pkint(&in2df2b,  1,  1); 
pvni_send(tids(3],  25); 

pvm_init8end(PvmDataDe£uilt); 
pvmjpkmt(&in2cf4b,  1,  1); 
pvmjpkint(&in2c£5b,  1, 1); 
pvinjpkint(&in5cf2c,  1,  1); 
pvin_8eiid(tids[4],  25); 

pvni_initseiid(PvmDataI>efault); 
pvm_pkint(&fo2dfZb,  1,  1); 
pvmjpkint(&in3bf2d,  1,  1); 
pvm_send(tids[5],  25); 

pvm_initsend(PvmDataDelkult); 
pvm_iddnt(&in3af2a,  1,  1); 
pvm_pkint(&inlbf3a,  1,  1); 
pvin_8end(tids[6],  25); 
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pivm_inilMad(PvinDrtiDe&iilt); 
pvm_|>kint(Ai‘n3bi2d.  1, 1); 
pviii_pldiii(AinSbOb,  1, 1); 
pvni_Mad(tuii(7],  25); 


pvm_iiutseiid(PviiiD«liDe&ult); 
pvm_plriiit(Ain3cf4c,  1, 1); 
pvm_pkiiit(ftin3cflb,  1, 1); 
pvm_plriiit(AiB5cOc,  1, 1); 
pvm_a«iid(tMis(8].  25); 

pvm_imtsend(PviiiDateDefiuilt): 
pvm_pkiiit(&iii4a£5«,  1. 1); 
pvm_pkmt(&mlcf4«,  1, 1); 
pviii_Miid(tida(9],  25); 

pvm_mits(nd(PvinDatiI)efinilt); 
pvm_picinl(&iii4bCd,  1, 1); 
pvm_pkiiit(Ain2cf4b,  1, 1); 
pvm_Mod(tid8[10],  25); 

pvni_initsend(PvmnataT)eftiilt); 
pvm_pkint(&iii4cf2a,  1,  1); 
pvm_pkiiit(&iii3cf4c,  1,  1); 
pvm_8end(tid8[ll],  25); 

pvm_iiiitsend(PvniDataDefinilt); 
pvm_plrint(&in4<ifl»,  1,  1); 
pvm_piciiit(&ui2af4d,  1,  i); 
pvin_8eod(tids[12],  25); 

pvm_iiiitscod(PvmT)ataDefiuilt); 
pvin_pidnt(&iii5«fla,  1,  1); 
pvm_pkmt(&iii4afSa,  1,  1); 
pvin_sefid(tid8(13j,  25); 

pvm_iiiit8eiid(PvinDataDeftuit); 
pvm_pkiiit(&in5bf3b,  1, 1); 
pvin_piciiit(&i]i2cf5b,  1,  1); 
pvm_seiid(tids[14],  25); 

pvin_iiiitsmd(Pvi]iD8taDeAuilt); 
pvm_iddnt(&iit5cflc,  1,  1); 
pvin_]ddnt(&iii5cf2c,  1, 1); 
pvin_piciii^&iii5cf3c,  1,  1); 
pvin_8eiid(tid8[15],  25); 

/*  If  want  loading  *( 
if(wnl  ==  1) 

{ 


pvm  tpamCtr,  NULU  1.  ‘wiiS-.  1.  &stid<(0]): 
pvoTiiwwiiCiZ*.  NULL,  1.  ■■iiii20'.  1.  Aitid^l]): 
pvnTqicwnCsa*.  NULL.  1,  'sung-,  1.  &stidst2D; 

pvm_iiiit8eiid(Pvad>ataDe£nilt); 
pvm_pkiiit(tonproc,  1,  1); 
pvm_pldnt(stids,  sg^Hoc,  1); 
pvm  iiicaat(8tids,  nproc,  62); 

} 

/*  B^in  User  Progrsm  */ 

for  (done  »  0;  done  <  done  Ip;  done  +  +) 

{ 

if  (  dme  » sol  &&  wnl  »  ~  1) 

{ 

pvm_iiiitseiid(PviiiDstsDefiiiilt); 
pvmj>ldiit(&ld_nuai,  1, 1); 
pvm_8eiid(stid8(0],  22); 


} 

if  (done  »  >>  dmejp  -  1) 
dsU_intt[0]  *  -444.555; 

pvm  initsend(PvmDatsDefiiult ); 
pvmjpkdouble(data_niat,  dosize’^coinm _pun,  1); 
pvm_8end(tids(3], 

pvm_iBitsend(PvniDataDefimlt ); 
pvm_pkdoubIe(data_niat,  dosize^comm _jgain,  1); 
pvm_8eod(tids[12],  13); 

pvm_imtseiid(PvmDataDefiuilt ); 
pvm_j>kdouble(data_mat,  dosi2e*comm ,jain,  1); 
pvm_seod(tids(13],  14); 

for  (k  =  0;  k  <  my_wt*2;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  =  randO; 

mumtot  :=  mumtot  +  mum; 

data_iiiat(i+l]  =  i  +  1; 

} 


piintfCXn  On  loop  number  %d\n*,done); 

pvm_recv(tids[l],  25); 
pvm_upkdouble(&do_agam,  1,  1); 
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}  I*  eod  of  for  lo(^  */ 
priiitf(*\nT1ie  loop  is  dooeVn''); 

/*  Ensure  sU  slaves  have  quit  prior  to  termination  *! 
pvin_iecv(tids(15],  35); 
pvm_upkdouble(&do_again,  1.  1); 

piinti(*\nPiogram  ml.c  cwt  »  %d,  ewt  «  %d  is  d(Mie\n*,  coiiim_^gain,  my_wt/4); 

/*  Program  Finished  exit  PVM  before  stewing  */ 
pvm_exitO: 

}  I**  END  OF  MAIN  PROGRAM  **/ 
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THE  INDIVIDUAL  SLAVE  FROGRAMS: 

Skve  pfQgnm  for  node  PIB 

1 1><  1 1 1 1  in  ri  ii  r  1 1  rtrrrt  i  rttrtttti-t  rrttTTT---Tt-i-T-^--‘— t 

findiide  *pviii3.h* 

#iiiclude  <stdk>.h> 

#include  <8y8/tiiiie.h> 

#incliide  <tiaie.li> 
jNnclude  <8ys/type8.h> 

#iiiclude  <8igiul.li> 

Piachido  <8tdlib.h> 

/*  CONSTANTS  */ 

^define  paicDt  6 
#defiiie  child  8 

mainO 

{ 

int  i,k,niytid,  master; 
int  tids[20]; 

int  nproc,  msgtype,  me; 

int  mum,  mumtot^^O,  dooeasO,  disize,  dosize; 

double  data_mat[10000]; 

int  my_wt,  comm _gain; 

/♦  enroll  in  pvm  */ 

mytid  =  pvm_mytidO; 
master  =  pvm_parentO; 

pvm_recv(masta,  20 ); 
pvm_upkint(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvmjiq)ldnt(&my_wt,  1,  1); 
pvm_iq>kint(&comm jgain,  I,  1); 

pvm_iecv(master,  25 ); 
pvm_upkint(&disi2e,  1,  1); 
pvm_iq>kint(&dosize,  1,  1); 

for  ( i=0;  i<npioc;  i+  +  ) 
if  (mytid  =  s  tids[i] )  {  me  =  i;  break;} 


udiile  (done  =  =  0) 

{ 

pvm_recv(tid8{para:itl,  me+ 1); 
pvm_u{dcdouble(data_mat,  disize''%omm_gain,  1); 

if  (data_mat[0]  <  0) 
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done  »  1; 


/**  clave  execution  core  **/ 

for  (k  «  0;  k  <  niy_wt;  k+  +) 
for  (i  «  0;  i  <  IMK);  i+  +) 

{ 

mum  »  landQ; 

nnuutot  »  mumtot  +  mum; 

data  iiiat(i+l]  i  +  1; 

} 

pvm_iiutaend(PvmDataD^Milt ); 
pvm_pkdouble(data_]iict,  dosize^Ooiimi ,^gaiii.  1); 
pvm_aeod(tids(child],  child+l); 

}  I*  end  of  while  d(»e  ^  =  0  loop  *1 

/*  Program  finished.  Exit  PVM  before  stopping  *f 

pvm_exit0: 

}  /*  End  of  Slave  program  plb.c  *! 
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Sbve  prognun  for  node  PIC 

fiiichide  ”pviii3.h” 
jKncliide  <8tdio.h> 

^include  <syB/tiioB.h> 

#iBclude  <tiinB.h> 

#iiiclude  <sys/type8.h> 

#iiiclude  <dgnal.h> 

#iiicliide  <ttdlib.h> 

/•  CONSTANTS  */ 

#deiiiie  parait  9 
#define  diild  IS 

mainO 

{ 

int  i,k,inytid,  master; 
int  tids(20]; 

int  npfoc,  msgtype,  me,  disize,  dosize; 
int  mum,  mumtot»0,  (kme^O; 
double  data_mat(10000]; 
double  do_again  =  2.2; 
int  my_wt,  comm,^gain; 

/*  enroll  in  pvm  */ 


m3rtid  =  pvm_mytidO: 
master  —  pvm_pareatO; 

pvm_recv(master,  20 ); 
pvm_tq>ldnt(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_iqddnt(&my_wt,  1,  1); 
pvm_upkint(&comm_gain,  1,  1); 

pvm_recv(master,  25); 
pvm_tqddnt(&disize,  1,  1); 
pvm_i9kint(&dosize,  1,  1); 

for  ( i=0;  i<nproc;  i+  +  ) 
if  (mytid  =  tids[i] )  {  me  =  i;  break;} 


while  (done  =  =  0) 

{ 

pvm_recv(tids[parent],  me+1); 
pvm_upkdouble(data_mat,  disize'*tomm_gain,  1); 

if  (data_mat[0]  <  0) 
done  =  1; 
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I**  flave  execudoo  core  **/ 


for  (k  «  0;  k  <  my_wt;  k+  +) 
few  (i  »  0;  i  <  1360;  i+  +) 

{ 

mum  » landO; 

mumtot  *  muoitot  +  mum; 

daU  mat(i+l]  a*  i  +  1; 

} 


pvm_initaeod(PvmDataDe£nilt ); 

pvm3lBdoubl^data_mat,  doaizB^tomm _gain,  1); 
pvmjMad(tids[ciiil4>  diild+1); 


pvmJiiitseBd(PvmT>ataT)efiuilt); 
pvmjdcdouUe(&do_a{am,  1,  1); 
pvm_seiid(maater,  25); 

}  /*  end  of  wdiile  done  *  =  0  lo<^  *1 

/*  Program  finidted.  Exit  PVM  before  stepping  */ 

pvm_exit0; 

}  /*  End  of  Slave  program  plc.c  */ 
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Ssve  prognm  for  node  P2A 


•*« 


****! 


jNnclude  "pvmS.h” 
fmchide  <stdto.h> 

#incliide  <qrs/tinie.h> 

#inchide  <liniB.h> 

#indiide  <8ys/types.h> 

Mnchide  <sigDal.h> 

#include  <stdlib.h> 

/♦  CONSTANTS  */ 

#deiine  perent  12 
fdefinechildl  11 
#define  child2  6 

mtinO 

{ 

int  i,k,mytid,  master,  disize,  dosize; 
int  tids[20],  my_wt,  comm_jain: 
int  nptoc,  msgtype,  me; 
int  mum,  raumtotsO,  done=0; 
double  dat8_mat[ 10000]; 

/*  enroll  in  pvm  */ 

mytid  »  pvm_mytidO; 
masta-  ==  pvm_parentO; 

pvm_iecv(master,  20  ); 
pvm_upkint(&nproc,  1,  1); 
pvm_iq>kint(tids,  npioc,  1); 
pvm_vpkint(&my_wt,  1,  1); 
pvm_upkint(&comm _,gain,  1,  1); 

pvm_recv(master,  25); 
pvm_iqdcint(&disize,  1,  1); 
pvm_iqddnt(&dosize,  1,  1); 

for  ( i=0;  i<nproc;  i+  +  ) 
if  (mytid  »  »  tids[i]  )  {  me  =  i;  break;} 

while  (done  ==  0) 

{ 

pvm_recv(tids[parent],  me+1); 
pvm_upkdouble(data_niat,  disize*comm .jain,  1); 

if  (data_mat[0]  <  0) 
done  =  1; 
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/**  sieve  exflcutiaii  ooie  **l 


for  Oc  «  0;  k  <  my_wt;  k+  +) 
for(i  -O;!  <  1360; i++) 

{ 

mum  —  nmdO: 

nuimtot  »  mumtot  +  mum; 

data  mat[i-t-l)  »  i  +  1; 

} 

pvm_iiiitseiid(PvmDataDe£nilt ); 
pvmjpkdouble(data_mat,  doaLm^xunm _gam,  1); 
pvm_8eod(tids[childl],  cliildl  + 1); 

pfvm_iDitsead(PvmDataDefinilt ); 
pvm_pkdouUe(data_mat,  dosize^mm .jain,  1); 
pvm_8ead(tids{child2],  child2+ 1); 

}  I*  end  of  while  done  »  =  0  loop  *! 

I*  Program  finished.  Exit  PVM  before  8tq[>pmg  *! 

pvm_exitO: 

}  /*  End  of  Slave  program  p2a.c  *■/ 
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fm****«i***m 

fbriiod»P2D 


^laveprognun 


«*> 


^include  *pviii3.h* 
fiiichide  <itdio.h> 

Madade  <fys/ti]iis.h> 
jNoclude  <tiiiiD.li> 
jNnclude  <9yt/typt».h> 

#iodude  <sigiial.h> 
fiiidiide  <itdlib.h> 

/•  CONSTANTS  •/ 
idefiiie  parent  3 
Adeline  childl  7 
#defuie  child2  10 

mainO 

{ 

int  i,k,mytid,  master; 
int  tids[20],  my_wt,  comm ,jain; 
int  iq)roc,  msgtype,  me,  disizB,  dosize; 
int  mum,  mumtot^O,  done^O: 
douUe  data_mat(10000]; 

/*  enroll  in  pvm  */ 

mytid  =  pvm_mytidO; 
master  »  pvm_pareatO: 

pvm_recv(master,  20  ); 
pvm_iq)kint(&nproc,  1,  1); 
pvm_iq>kint(tids,  nproc,  1); 
pvm_«]9ddnt(&my_wt,  1,  1); 
pvm_iqpkint(&conim .^gain,  1,  1); 

pvm_iecv(master,  25); 
pvm_upkint(&disize,  1,  1); 
pvm_iq>kint(&dosizB,  1,  1); 

for  ( i=0;  i<npfoc;  i+  +  ) 
if  (mytid  =  =  tidsCi] )  {  me  =  i;  break;} 


while  (dcme  =  =  0) 

{ 

pvm_recv(tids[parent],  me+ 1); 
pvm_uidcdouble(data_mat,  disize'^ninm .^gain,  1); 

if  (data_mat[0]  <  0) 
done  =  1; 
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!**  slave  execudon  core  **/ 


for  (k  -  0;  k  <  myjvt;  k+  +) 
fov  (i  >■  0;  i  <  1360;  i+  +) 

{ 

muB  »  rudO; 

miuntot  «  nuimlot  4-  nnun; 

data  niat[i-f-l]  «  i  4-  1; 

} 

pviii_iiiitaeiid(PviiiDataDe£iult ); 
pviBr]ikdotible(data_inat,  dosizB^coiniii _jgain,  1); 
pvm_seiid(tids[childl],  cfaildl4-l); 

pvm_imtseiid(PviiiDataDefiuilt ); 
pvmj)kdouble(data_iiiat,  doaize^coinm jgam,  1); 
pvffl_8eiid(tida[cliild2],  duld24- 1); 

}  /*  end  of  vrfiile  done  m  0  loc^  */ 

I*  Progfam  finished.  Exit  PVM  before  stopping  *! 

pvm_exit0; 


}  /*  End  of  Slave  program  p2d.c  •/ 
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Slave  jMopaiB  lx  node  PSA 

»«»»»»♦»«»»»»  i>««»*<i«»»»«»*»e«««*«»»**«i««***»»*««*»**»****  **************/ 

Ifindude  *pvin3.h* 

Pmchide  <stdio.h> 
jNochide  <^e/tiiiie.h> 

Pincliide  <tuiie.h> 

Pindude  <a]rs/9pes.h> 
fiaclude  <8ignal.h> 

Ifioclude  <8tdlib.h> 

/*  CONSTANTS  */ 

^define  paieat  2 
Pdefine  child  0 

mainO 

{ 

int  i,k,mytid,  master; 

int  tids(^],  my_wt,  oomm_2aiii,  disize,  dosize; 
int  oproc,  msgtype,  me; 
int  mum,  mumtot^O,  donesO; 
double  data_mat[10000]; 

/♦  enroll  in  pvm  */ 

mytid  =  pvm_mytidO; 
master  =  pvmjwrentO; 

pvm_recv(master,  20  ); 
pvm_upfcint(&nproc,  1,  1); 
pvm_tqddnt(tids,  nproc,  1); 
pvm_tqddnt(&my_wt,  1, 1); 
pvm_upkint(&comm _gain,  1,  1); 

pvm_recv(master,  25  ); 
pvm_upkint(&disize,  1,  1); 
pvm_iqpkint(&dosi2e,  1,  1); 

for  ( i=0;  i<nproc;  i+  +  ) 
if  (mytid  =  =  tids(i] )  {  me  —  i;  break;} 


vdiile  (done  =  =  0) 

{ 

pvm_recv(tids[parent],  nie+ 1); 
pvm_uidc;dooble(data_mat,  disize^comm .jgain,  1); 

if  (data_mat[0]  <  0) 
done  =  1; 
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/**  ikve  eMcutka  cote  **! 


for  Oc  -■  0;  k  <  my.wt;  k+  +) 
for(i>-0;i  <  1360;  i++) 

{ 

mum  —  nudO; 

mumtot  •  mumtot  +  raum; 

data  natCi+l]  -  i  +  1; 

> 

pvm_i]iitaead(PviiiDataDe£uilt ); 
pvm_pkdouUe(data_iiiat,  doai2B*teoinffl_fain,  1); 
pvm_acad(tids(cliild(],  child +1); 

}  /*  end  of  v^a  done  —  0  hx^  *i 

I*  Program  finished.  Exit  PVM  before  stopping  */ 

pvm_exitO; 


)  /*  End  of  Slave  program  p3a.c  */ 
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Slave  piofiam  for  node  P3B 


I* 

» 


* 


findude  'pvmS.h* 
iincbide  <stdio.h> 
jNnclude  <qrs/tiine.h> 
jHadude  <tiino.h> 
iNndiide  <^s/typer.h> 
findude  <signal.h> 

#indiide  <8tdlib.h> 

/*  CONSTANTS  •/ 
fdefine  parent  5 
fdefine  child  14 

mainO 

{ 

int  i,k,inytid,  master,  diaiw,  dosize; 
int  tida(^],  my_wt,  comm _jgam; 
int  i^Moc,  msgtype,  me; 
int  mum,  mumtot^O,  doae»0; 
double  data_mat(10000]; 

/*  enroll  in  pvm  */ 

mytid  pvm_mytidO; 
master  •  pvm_parentO; 

pvm_recv(ma8ter,  20  ); 
pvm^ufddntC&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_iq)kint(&my_wt,  1,  1); 
pvm_iqddnt(&comm_jain,  1,  1); 

pvm_recv(master,  25  ); 
pvm_tqddnt(&disize,  1,  1); 
pvm_upkint(&dosize,  1,  1); 


for  ( i=0;  i<nproc;  i+  +  ) 
if  (mydd  =  =  tids[i] )  {  me  a:  i;  break;} 


while  (done  =  =  0) 

{ 

pvm_recv(tids{parmt],  me+1); 
pvm_i9kdouble(data_mat,  disize'^x>mm_gain,  1); 

if  (data_mat(0]  <  0) 
d(me  =  1; 
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/**  siav«  executkm  ooie  **! 


for  (k  s  0;  k  <  niy_wt;  k+  +) 
for  (i  »  0;  i  <  1360;  i+  +) 

< 

nnim  »  nmdO: 

rmuntot  —  raumtot  +  mum; 

data  mat(i-t-l]  »  i  -f  1; 

} 

pvm_uiitsend(PvmDataDefoiilt ); 
I)vmjpkdouUe(data_mat,  dosize*oomm .^gam,  1); 
pvm_aaid(tids(cliil<q,  child +1); 

}  I*  end  of  while  done  ^  s  o  loop  *! 

I*  Program  finished.  Exit  PVM  before  stopping  *! 

pvm_exit0: 


)  /*  End  of  Slave  program  p3b.c  ♦/ 
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/» I II II  >»>«»»  »t********mt 

Sieve  prognun  fior  node  P3C 


titimm************/ 


finchide  "pvmS.h” 
include  <stdio.h> 

#include  <8y8/tiiiie.h> 

#iiic]iide  <tiiiie.h> 

^include  <qrs/type8.1i> 

#include  <sigiud.h> 

^include  <8tdiib.h> 

/♦CONSTANTS*/ 

#defiiie  pareotl  11 
#<tefuie  parent2  0 
jVdefine  child  IS 

mainO 

{ 

int  i,k,mytid,  master; 

int  tids[20],  my_wt,  comm^gain; 

int  nproc,  msgtype,  me; 

int  mum,  mumtot=0,  dooe=0; 

dmible  data_mat(10000]; 

int  disuse,  disi2e2,  dosine; 

/*  enroll  in  pvm  */ 

mytid  =  pvm_mytidO; 
master  ^  pvmjpaientQ; 

pvm_recv(master,  20  ); 
pvm_upkint(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_upkint(&my_wt,  1,  1); 
pvm_tq>kint(&comm _gain,  1,  1); 

pvm_recv(master,  25); 
pvm_upkint(&disize,  1,  1); 
pvm_iq>kint(&disize2,  1,  1); 
pvm_upkint(&dosize,  1,  1); 

my_wt  =  my_wt*2; 

for  (  i=0;  i<nproc;  i+  +  ) 
if  (mytid  =  =  tids(i] )  {  me  =  i;  break;} 


while  (done  =  =  0) 

{ 

pvm_recv(tids[parentl],  me+1); 
pvm_upkdouble(data_mat,  disize*comm_gain,  1); 
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pvin_i«cv<tidcQ>if«nt2],  me+l); 

pvm  ii|dcdoiible(date_iiiat,  ditti2e2*coiiiiii _g>ui,  1); 

if  (data_mat[0]  <  0) 
done  »  1; 

/**  slave  execution  core  **/ 

for  (k  -  0;  k  <  my_wt;  k+  +) 
fiw  (i  ■  0;  i  <  13M;  i+  +) 

{ 

mum  »  randQ: 

rminUot  »  mumtot  +  mum: 

data  mat[i+l]  -  i  +  1; 

} 

pvm  initseod(PvmDataDefault ): 
pvm_idcdouble(data_mat,  do8i2e'*^mm _gaio,  1); 
pvm^8end(tids(chiM],  child +1); 

}  /*  end  of  ubiie  done  =  =  0  loop  *1 

I*  Prognm  finished.  Exit  PVM  before  sb^ing  *1 

pvm_exit0; 

}  /*  End  of  Slave  program  p3c.c  */ 
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/>  »«  >»«**«***«»***»**** 

SUv«  pn^nm  for  node  P4A 

#include  ”pvm3.h'' 

#include  <8tdio.h> 

#iiidude  <sys/time.h> 

#indude  <time.h> 
iNnclude  <sys/types.h> 

#include  <stgnal.h> 

#include  <stdlib.h> 

/*  CONSTANTS  */ 
fdefine  parent  13 
#define  child  1 

mainQ 

{ 

int  i,k,mytid,  master; 
int  tids[20],  my_wt,  comm jgain; 
int  nproc,  msgtype,  me,  disiie,  dosize; 
int  mum,  mumtotisO,  done^O; 
double  data_mat[10000]; 

/*  enroll  in  pvm  *! 

mytid  =  pvm_mytidO; 
master  =  pvmrparentO; 


pvm_recv(master,  20  ); 
pvm_iqpkint(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_iq>kint(&my_wt,  1,  1); 
pvm_upkint(&comm _gain,  1,  1); 

pvm_iecv(master,  2S); 
pvm_iq>kint(&disize,  1,  1); 
pvm_iq)lrint(Adofii7g,  1,  1); 

for  ( i«0;  i<nproc;  i++  ) 
if  (mytid  =  =  tids(i] )  {  me  =  i;  break;} 


while  (done  =  =  0) 

{ 

pvm_iecv(tids{parent],  me+ 1); 
pvm_iq)kdouble(data_mat,  disize^conun jgain,  1); 

if  (data_mat(0]  <  0) 
done  s  1; 
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!**  slave  execution  cote  **! 

for  (k  -  0;  k  <  my_wt;  k+  +) 
for  (i  >■  0;  i  <  1360;  i+  +) 

{ 

mum  landO; 

mumtot  »  mumtot  +  mum; 

data  mat[i+l]  »  i  +  1; 

> 

pvm_mitsend(PvmD«UDefoult ); 
pvm_pkdoubIe(data_mat,  dosize^mm jgain,  1); 
pvm_8etid(tids(child],  child +1); 

}  !*  end  of  while  done  s  o  loop  *! 

/*  Program  fioislied.  Exit  PVM  before  stewing  *! 

pvm_exit0; 

}  /*  End  of  Slave  program  p4a.c  *t 
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/  II  >i>»***«******«Wi**li*W 

Slave  pragnun  for  node  P4B 


#include  ”pvni3.h* 
#include  <atdk>.h> 
#indude  <8y8/tiine.h> 
#include  <tiine.li> 
#incliide  <^s/Qi)e8.h> 
#include  <signal.h> 
^include  <stdlib.h> 

/*  CONSTANTS  */ 
#defuie  parent  S 
#define  child  4 


mainO 

{ 

int  i,k,mytid,  master, 
int  tids(20],  my_wt,  comm_^gain; 
int  nptoc,  msgtype,  me,  disize,  dosize; 
int  mum,  inumtot=0,  done=0; 
double  data_niat[ 10000]; 

/•  enroll  in  pvm  */ 

mytid  =  pvm_mytidO; 
master  =  pvm_paraitO; 


pvm_iecv(master,  20 ); 
pvm_upkint(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_iq>kint(&my_wt,  1,  1); 
pvm_upkint(&comm _jgain,  1,  1); 

pvm_iecv(master,  25 ); 
pvm_iq>kint(&disize,  1,  1); 
pvm_iq>kint(&dosize,  1,  1); 


for  ( i=0;  i<nproc;  i+  +  ) 
if  (mytid  =  =  tids[i] )  {  me  =  i;  break;} 


while  (done  =>  =  0) 

{ 

pvm_recv(tids{parent],  me+ 1); 
pvm_U{dtdouble(data_mat,  disize*comm .^gain,  1); 

if  (data_mat[0]  <  0) 
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done  s  1; 


!**  slave  execution  ooie  **! 

fiw  (k  •  0;  k  <  iny_wt;  k+  +) 

(i  >■  0;  i  <  1360;  i+  +) 

{ 

mum  »  landO; 

mumtot  »  niumtot  +  mum; 

data  mat[i+l]  »  i  +  1; 

} 

pvm_iiiit8Cpd(PvmDataDefault ); 
pvm_idcdouble(data_mat,  do6ize*oomm_gam,  1); 
pvm_setid(tids[cliild],  child +1); 

}  /*  end  of  while  done  *  *  0  loop  ♦/ 

/*  Program  finished.  Exit  PVM  before  stopping  *! 

pvm_exit0; 


}  /*  End  of  Slave  program  p4b.c  */ 


78 


sieve  piogiam  for  node  P4C 


#iiidude  *pviii3.h* 

Piodude  <ttdio.h> 

#iiicliide  <sy8/tiiiie.h> 

#iiicliide  <tiiiw.h> 

#iiicliide  <qw/type8.1i> 

^include  <signal.h> 

Pinclude  <8tdUb.h> 

/*  CONSTANTS  */ 

#define  puent  2 
^define  child  8 

msioO 

{ 

int  i,k,mytid,  master, 
int  tid8[20],  my_wt,  comm _jKaiii; 
int  nproc,  msgtype,  me,  disize,  dosize; 
int  rniun,  niumtot=0,  dooe^O; 
double  data_mat[10000]; 

/*  enroll  in  pvm  *! 

mytid  >=  pvm_mytidO: 
master  pvm_pareotO: 

pvm_recv(master,  20  ); 
pvm_upkint(&nproc,  1,  1); 
pvra_iq)kint(tids,  nproc,  1); 
pvm_uiddnt(&my_wt,  1,  1); 
pvm_upkint(&commjgain,  1,  1); 

pvm_iecv(master,  2S ); 
pvm_upkint(&disize,  1,  1); 
pvm_upkint(&dosize,  1,  1); 

for  (i=0;  i<nproc;  i++  ) 
if  (mytid  »  ~  tids[i]  )  {  me  »  i;  break;} 


while  (deme  =  =  0) 

{ 

pvm_recv(tids[parent],  me+ 1); 
pvm_iqdcdouble(data_mat,  disize'*^mm_gain,  1); 

if  (data_mat[0]  <  0) 
done  s  1; 
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/<M  slave  executiao  core  **/ 


for  (k  *  0;  k  <  iny_wt;  k+  +) 
fcv  (i  »  0;  i  <  1360-,  i+  +) 

{ 

mum  »  nodO; 

nnuntot  «■  muintot  +  mum; 

data  mat[i+l]  >■  i  +  1; 

} 

pvm_mitaead(PvmDataDe£nilt ); 
pvmj)kdoubl^data_mat,  do«ize'*Comm ,jgain,  1); 
pvm_seod(tids[cliiId],  child+1); 

}  /•  end  of  while  done  =  *  0  loop  ♦/ 

/*  Program  finished.  Exit  PVM  before  stopping  */ 

pvm_exit0; 

}  /*  End  of  Slave  program  p4c.c  *f 
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Skve  prognm  fornode  P4D 


****> 


****! 


#uicliide  'pvmJ.b'’ 
finclude  <stdio.h> 

#iiicliide  <qraytiiiie.h> 
iKnclude  <tiiiie.h> 

#iiiclude  <sys/QnP^‘^^ 

Ifindude  <sisiud.h> 

#iiKlude  <stdlib.h> 

/*  CONSTANTS  */ 

#defineclukl2 

nuiinO 

{ 

int  i  JCtinytid,  oiaster; 
int  dd8(^],  my_wt,  ooiiun_^gain; 
int  nproc,  msgtype,  me,  disize,  dosia; 
int  mum,  mumtot«0,  doae=>0; 
double  dat>(_mat[10000]; 

/*  enroll  in  pvm  *! 

mydd  =  pvm_myddO; 
mastm- 1°:  pvm_paraitO; 

pvm_recv(master,  20 ); 
pvm_U[ddint(&nproc,  1,  1); 
pvm_i^>kint(dds,  nproc,  1); 
pvm_upkint(&my_wt,  1,  1); 
pvm_<qddnt(&comm_^ain,  1,  1); 


pvm_iecv(mister,  25); 
pvm_u|ddnt(&disize,  1,  1); 
pvm_uiddnt(&dosizB,  1,  1); 

for  ( i=®0;  i<nproc;  i+  +  ) 
if  (mydd  =  =  dds[i] )  {  me  =  i;  break;} 


uiule  (done  =  =  0) 

{ 

pvm_recv(master,  me+1); 
pvm_iqdcdouble(data_niat,  disize''‘comm ,^gain,  1); 

if  (data_mat[0]  <  0) 
done  =:  1; 

!**  slave  execudon  core 
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*Mr  Oc  •  0;  k  <  myjtn;  k+  +) 
fin  (i  »  0;  i  <  1360;  i+  +) 

{ 

mum  «  nodO; 

raumtot  »  mumlot  +  mum; 

data  iiiat[i+l]  *  i  +  1; 

} 

pvm^init8eiid(PvmDataDefiuilt ); 
|nmrplBk)ubMd*ta_mat,  doaize^conun jgain.  1); 
pvm_seiid(tids(child],  child +1); 

}  /*  end  of  while  done  «  «  0  loop  */ 

/*  Progiam  finished.  Exit  PVM  befbn  stewing  *1 

pvm_exit0; 


}  /*  End  of  Slave  progiam  p4d.c  */ 
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Slsve  prognm  for  node  PSA 


^include  "pvmS.h* 

#incliide  <stdio.h> 

Pioclude  <sys/tiine.h> 
finclude  <time.h> 
findude  <qrs/type8.h> 

#include  <agnel.h> 
finclude  <8tdlib.h> 

/♦  CONSTANTS  ♦/ 

#definechild9 

mainO 

{ 

int  i,k,mytid,  master; 
int  tids(20],  my_wt.  comm .^gain; 
int  iq>ioc,  msgtype,  me,  disize,  dosize; 
int  mum,  mumtot^O,  done»0; 
double  data_mat[10000]; 

I*  enroll  in  pvm 

mytid  =  pvm_mytidO; 
master  pvm_parentO; 

pvm_recv(master,  20  ); 
pvm_uiddnt(&nproc,  1,  1); 
pvm_tqddnt(tids,  nproc,  1); 
pvm_uiddnt(&my_wt,  1,  1); 
pvm_iq>kint(&comm_gain,  1,  1); 

pvm_recv(master,  25); 
pvm_upkint(&disize,  1,  1); 
pvm_upkint(&dosize,  1,  1); 

for(i=0;  i<nproc;  i  +  +  ) 
if  (mytid  =  =  tids[i]  )  {  roe  =  i;  break;} 


while  (done  =  =  0) 

{ 

pvm_recv(master,  me+1); 
pvm_i9kdouble(data_mat,  disize*comm jgain,  1); 

if  (data_mat[0]  <  0) 
done  =  1; 

/**  slave  execution  core  **/ 
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for  (k  ■  0;  k  <  my  wt;  k+  +) 
forfi-0;i  <  13M;i++) 

{ 

mum  *  imdO; 

mumtot  »  mumtot  +  mum; 

data  mat[i-t-l]  «  i  -t-  1; 

} 

pvm_mitacod(Pvmr)ataPefault ); 
pvm_pkdouble(data_mat,  doeue^teinm _gain,  1); 
pvm_8eiid(tids(cliild],  child +1); 

}  /*  end  of  wdiile  done  *  -  0  loop  */ 

I*  Program  finislied.  Exit  PVM  before  stopping  */ 

pvm_exit0; 


}  I*  End  of  Slave  program  pSa.c  *1 
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fmmmm***m****»********** 
Save  pcognuB  for  node  PSB 


^include  *pvm3.h* 
#incliide  <stdio.h> 
#include  <sys/ti]i>e.h> 
#incliide  <tinie.h> 
#indiide  <sys/typai.h> 
jKnclode  <rignal.h> 
^include  <stdlib.h> 

/♦  CONSTANTS  */ 
#define  paient  7 
#defuie  child  4 


mainO 

{ 

int  i,k,mytid,  master; 
int  tids[20],  my_wt,  comm .fain; 
int  nproc,  msgtype,  me,  dosize,  disize; 
int  mum,  mumtot^BO,  done=0; 
double  data_mat[ 10000]; 

/*  enroll  in  pvm  */ 

mytid  =  pvm_m)rtidO; 
master  »  pvmjMuentO: 

pvm_iecv(master,  20  ); 
pvm_upkint(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_i9]dnt(&my_vvt,  1,  1); 
pvm_upkint(&comm ^ain,  1,  1); 

pvm_recv(master,  25 ); 
pvm_iqddnt(&disize,  1,  1); 
pvm_upkint(&dosize,  1,  1); 

for  (  i=>0;  i<nproc;  i+  +  ) 
if  (mytid  =®  tids[i] )  {  me  =  i;  break;} 


udiile  (done  =  =  0) 

{ 

pvm_recv(tids[parent],  me+ 1); 
pvm_upkdouble(data_mat,  disize^comm .gain,  1); 

if  (data_mat[0]  <  0) 
done  =  1; 
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t**  dsve  executioo  core  **/ 


for  (k  »  0;  k  <  myjwt;  k+  +) 
for(i  >-0;i  <  I3«;i++) 

{ 

mum  »  nndO: 

mumtot  mumtot  +  mum; 

data  mat[i+l]  «  i  +  1; 

} 

pvm_iiiitaeiid(PvmDataDe£nilt ); 
pvmj>kdouble(data_mat,  docin^comm _jam.  1); 
pvm_8eiid(tids(cluld],  child +1); 

}  /♦  cod  of  v^e  done  »  «  0  loop  */ 

/*  Program  foiisbed.  Exit  PVM  before  stopping  */ 

pvm_exit0; 


}  /*  End  of  Slave  prognm  pSb.c  */ 
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#iiiclude  ”pvin3.h* 
#iiiclude  <stdio.h> 
#iiiclude  <8ys/tiiiie.h> 
#uicliide  <ti]iie.h> 
#iiiclude  <8ys/type8.h> 
#ii>clude  <8igiial.h> 
^include  <stdlib.h> 


/♦  CONSTANTS  */ 

^define  paieotl  1 
#defiiie  pureotZ  4 
^define  parents  8 

mainO 

{ 

int  i,k,mytid,  ouster,  disize,  disizeS,  disizeS; 

int  tids[20],  myjwt,  commjgain; 

int  nproc,  ougtype,  me.  go_now  =  1; 

int  mum,  mumtot=0,  done=0; 

double  data_nut(10000]; 

int  oldintv  =  0,  newintv  =  0,  tcalc; 

FILE*olp; 
struct  timeval  stime; 

/*  enroll  in  pvm  •/ 
mytid  =  pvm_mytidO; 
ouster  >=  pvm_paraitO; 

pvm_recv(master,  20  ); 
pvm_upldnt(&nproc,  1,  1); 
pvm_upkint(tids,  nproc,  1); 
pvm_upkint(&my_wt,  1,  1); 
pvm_upkint(&comm_gain,  1,  1); 
my_wt  =  my_wt*2; 
pvm_recv(master,  2S  ); 
pvm_upldnt(&disize,  1,  1); 
pvm_upkint(&disize2,  1,  1); 
pvm_upkint(&disize3,  1,  1); 

for  ( i=0;  i<nproc;  i+  +  ) 
if  (mytid  =  =  tids[i]  )  {  me  =  i;  break;} 

ofp  =  fopen(*/honie3/stone/Thesis/ma0ab_files/No_sched.out",'w"); 

while  (done  =  =  0) 

{ 

pvm_recv(tids[parentl],  me+1); 
pvm_upkdouble(data_mat,  disize’'^nun_gain,  1); 


pvm_f«cv<tids[iMf«Bt2],  iiib+1); 
imnjipkiiouble(datai_iiiat,  duuB2*t<Mi]mjgaui,  1); 

pvia_fecv(tids(p«niit3],  me+l); 
pvmju|ikdoiible(data_iiMU,  di8i2B3'*coiiim jtain,  1); 

if  (dau_iiiat[0]  <  0) 
done  -  1; 

/**  slave  executi<»  core  **/ 
for  (k  s  0;  k  <  iny_wt;  k+  +) 
fw  (i  *  0;  i  <  1360;  i+  +) 

{ 

nuim  s  tandO; 

mumtot  »  mumtot  +  mum; 

data  iiiat[i+l]  »  i  +  1; 

} 

gettiineofoay(&stime,  (struct  tiiiieval'*')0); 
iiewiiitva>stiiiie.tv_8ec*1000000+stiiiie.tv_usec; 
tcalc  =  newintv-oldintv; 

4nmtf(ofp,  *\ii  Xd” ,  tcalc); 
oldintv  s  newintv; 

}  /*  end  of  while  done  ^  —  0  loop  */ 

fclose(o^); 

J*  Tell  the  Master  all  slaves  have  terminated  */ 
pvm_initsend(PvmDataDeliailt ); 
pvm_pkdouble(&go_now,  1,  1); 
pvm_send(inaster,  35); 

/♦  Program  finished.  Exit  PVM  before  stopping  */ 
pvm_exit0; 

}  I*  End  of  Slave  program  pSc.c  */ 
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APPENDIX  E  -  SCHEDULED  NODE  PROCESSING  PROGRAMS 


MASTER  PROGRAM  (PROCESSOR  1): 


Scheduled  master  program,  also  handles  node  Processor  1  communicatirm  and  execution 
requirements. 


ginclude  "pvmS.h* 
#include  <8tdio.h> 
#include  <sy8/time.h> 
^include  <time.h> 
#include  <sys/types.h> 
^include  <signal.h> 
findude  <8tdlib.h> 


/♦  CONSTANTS  */ 

gdefine  (kme_lp  1000  /*  Loop  iteration  coimter  */ 

#define  snl  400  /*  Iteration  numbtf  for  start  of  network  loading  *! 

I*  GRAPH  TIME  VARIABLES  */ 
int  ld_num  55000; 

int  pltop2  300;  /*  These  variables  crmtain  the  interprocessor  comm*/ 

int  pltop3  =  300;  /*  costs,  read  has  Processor  i  to  Processor  j  */ 

int  pltop4  s:  300; 

int  pltop5  =  300  +  300; 

int  p2top3  3=  350  +  350; 

int  pTJuipi  =  350  +  350; 

int  p2t(q)5  =:  300; 

int  p3topl  3c  525; 

int  p3topS  =  300  +  525; 

int  p4topl  -  350; 

int  p4top2  =:  400  +  350; 

int  p4top3  =  350; 

int  p5top2  =  300; 

int  p5top4  s  900; 

int  commjgain;  /*  For  varying  communication  weights  */ 

int  my_wt ;  /*  For  varying  execution  weights  */ 


/*  GLOBAL  VARIABLES  *f 
int  nproc  «  4; 

int  mytid;  /*  my  task  id  */ 

int  tids[20];  /*  slave  task  ids  */ 

int  done  =  0,  who; 
double  data_mat[9000),  go_now; 

mainO 

{ 

char  SLAVENAME[3]; 
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inti,  k; 

int  mnai.  nnuntot; 
diar  nqmaiDe[5]: 
struct  idnwrval  tmrval: 

int  sqnoc  3,  stid8(5],  wnl  *  0,  nl_done  *  0; 

printf(*\nConim  wt «  “); 
sennit”  5(d'',  Aconini .^ain); 
print^"\Exwt=  ”); 

8canft'5(d*,  Amyjwt); 
niy_wt  »  niy_wt*4; 

printit'VWith  NET  loading  type  1,  without  NET  loading  type  2:  ”); 
8canf(”%d”,  Awnl); 

I*  initialiTe  matices  */ 

fiw(k=0;k<1500;k++) 

Alta  niat[k]>(double)k+S.66666: 

/*  enroll  in  pvm  *! 

mytid  «  pvm_niytidO; 

I*  start  up  slave  tasks  */ 
gediostname(myname,5); 

pvin_spawn(*p2”,  NULL,  1,  ”sun3’,  1,  &tids{01); 
pvm  q>awn(*p3”,  NULL,  1,  "sunS",  1,  &tids(l]); 
pvmlspawn("p4”,  NULL,  1,  ’sun9",  1,  &tids[2]); 
pvm_spawn(’p5”,  NULL,  1,  ”sun20",  I,  &tidsl31); 


pvm_initsend(PvoiDataDefault); 
pvm_iddnt(&nproc,  1, 1); 
pvm_iddnt(tids,  nproc,  1); 
pvm_i*int(&my_wt,  1,  1); 
pvm_pkint(&conun _gain,  1,  1); 
pvm_mcast(tids,  nproc,  10); 

pvm_initsend(PvniDataDeiault ); 
pvm_{ddnt(&pltop2, 1,  1); 
pvm_|dcint(&p4top2,  1,  1); 
pvm_iddnt(d^top2,  1,  1); 
pvm_]ddnt(&p2tqp3,  1,  1); 
pvm_pkint(&p2top4,  1,  1); 
pvm_pkint(dq>2t<^,  1,  1); 
pvm_send(tids[0],  20); 


pvm  initsend(PvmDataDehiult ); 
pvm_jddnt(&pltop3,  1,  1); 
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pvm_|ikmt(A|>2top3, 1, 1); 
pvin_pkmt(Jlq>4top3, 1, 1); 
pvm_iikiiit(&ii3topl,  1, 1); 
pvin_p]diit(&i>3tc^,  1, 1); 
pvin_aeiid(tids(l],  20); 

l>vin_imt8ead(PvmD*t«T)efiuilt ); 
pvm_jikmt(&pltop4, 1, 1); 
pvm_pkmt(A72top4, 1, 1); 
pvm_pkiiit(4^to|>4, 1, 1); 
pvinjikiat(4l^>4topl,  1, 1); 
pvm_pkmt(4^4top2, 1, 1); 
pvm_|ddnt(&p4t(^,  1, 1); 
pvin_seiid(tid^],  20); 


pvm_iiiitseiid(PvinDataDef»ilt ); 
pvinjdcmt(&pltqpS,  1,  1); 
pvin_pkmt(A72t<^,  1,  1); 
pvin_i)ldnt(j^t<^,  1,  1); 
pvm_pldiit(&pStop2, 1, 1); 

1, 1); 

pvm_seDd(tid8[3],  20); 

if  (wnl  *  =  1) 

{ 

pvin_spawii(*sl",  NULL.  1,  "sun3*,  1,  &stids[0]); 
pvni_spa'vwi(*s2",  NULL,  1,  *suii20",  1,  &stids[l]): 
pvin_spawn(*s3",  NULL,  1,  *sun8*,  1,  &stids[2]): 


pvm_initsetid(PvinDataDeiaiilt); 
pvmjddntf&snproc,  1,  1); 
pvm_pkint(sttds,  snpioc,  1); 
pvm  mcast(stids,  snproc,  62); 

} 

I*  Begin  User  Program  *! 

for  (done  =  0;  done  <  done  Ip;  done  +  +) 

{ 


if  (done  =  =  snl  &&  wnl  =  =  1) 

{ 

pvm_init8eiid(PvniDataDeBnilt); 
pvm_[ddnt(&ld_num,  1,  1); 
pvm_send(8tids[0],  22); 

) 

if  (done  ■=  =  donejp  - 1) 
data_niat[0]  »  -444.SSS; 

pvm_initseiid(PvmDataDe&uIt ); 
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pvmjpkiioiiUfl(dita_iiwt,  pltopS^toomijgaiii,  1); 
pvm_Miid(ti«ls(3], 

pvm  iaitaaid^vniDateDefiult ); 
pvmjikdoatde(datai_aiat,  pltop4'*coinm_^,  1); 
pvm_sead(dds[2],  3): 

pvm  uuteead(PvinT>»tar>efimlt ); 
pvm_pkdouble(data_iint,  pltop3*coii)m_gtiQ,  1); 
pvm_seiid(tid8[ll,  2); 

pvm  i]utseiid(PvmDataDe£uilt ); 
pvmjilcdoublfl(datk_iMt.  pltop2^tiim_jain,  1); 
pvm_8eiid(tids[0],  f); 

/*  Execudon  core  8ecd<»i  */ 

for  (k  =  0;  k  <  my_wt*2;  k+  +) 
fiw  (i  “  0;  i  <  1360;  i+  +) 

{ 

mum  a:  raadO: 

mumtot  »  mumtot  +  mum; 

data_tiiat[i+l]  —  i  +  1; 

} 

for  Oc  =  0;  k  <  my_wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  «  randO; 

mumtot  mumtot  +  mum; 

data  mat[i+l]  =  i  +  1; 

} 

for  Oc  *  0;  k  <  my_wt;  k+  +) 
for(i  -0;i  <  1360;  i  +  +) 

{ 

mum  »  landO; 

mumtot  —  mumtot  +  mum; 

data  mat[i+l]  =  i  +  1; 

} 

printfClnOn  loop  number  %d\n”,done); 

if(data  mat[0]  >  »  0) 

{ 

pvm_iecv(tids[l],  2); 

pvm_iq)kdouble(daU_mat,  p3topl*cofflmLgain,  1); 
pvm_recv(dds[2],  3); 

pvm_upkdouble(data_mat,  p4topl*Commjain,  1); 
pvm_recv(tidst3J,  4); 
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pvm  iqilcdouble(&go  now,  1,  1); 

> 

}  /*  end  of  for  loop  */ 

printf(”\nThe  loop  is  doneNn’); 

/*  Ensure  all  slaves  have  quit  prior  to  termination  *1 
fot(iaO;  i<nproc;  i++) 

{ 

pvm_iecv(-l,  35); 
pvm_iqddnt(&wdio,  1,  1); 

} 

printfCVnProgram  ni2.c  cwt  =  %d,  ewt  =  5(d  is  doneNn”,  comm .jain,  my_wt/4); 

/*  Program  Fini^ed  exit  PVM  before  stopping  */ 
pvm  miitO; 

}  /**  END  OF  MAIN  PROGRAM  **/ 
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THE  INDIVIDUAL  SLAVE  PROGRAMS: 

Slave  prognun  for  scheduled  Processor  2 


<*** 


#iiiclude  'pvmS.h* 
finclude  <stdio.h> 

^include  <sys/time.h> 
finclude  <time.h> 
iVinclude  <8ys/type8.h> 

#mclude  <8ignal.h> 

#iiiclude  <8tdlib.h> 

/*  CONSTANTS  */ 

#defiiie  paientl  2 
^define  pafeat2  3 

mainO 

{ 

int  i,k,mydd,  master; 

int  tids[20],  stids[20],  my_wt,  comm .jain; 

int  nptoc,  msgtype,  me,  snproc; 

int  mum,  mumtotsO,  done«0; 

double  data_fflat{10000]: 

/*  These  hold  die  input  cost,"i*,  or  the  ou^;mt  cost,*o*  */ 
int  disizel,  disize4,  disizeS,  dosizeS,  dosize4,  dosizeS; 

/*  enroll  in  pvm  ♦/ 

mytid  ==  pvm_mytidO; 
master  =  pvm_paieatO; 

pvm_iecv(master,  10  ); 
pvm_iq>kint(&nproc,  1,  1); 
pvm_iq>kint(tids,  nproc,  1); 
pvm_upkint(&my_wt,  1,  1); 
pvm_iipkint(&comm_gain,  1,  1); 

pvm_iecv(master,  20  ); 
pvm_upkint(&disizel,  1,  1); 
pvm_iq)kint(&disize4,  1,  1); 
pvm_iq>lEint(&disizBS,  1,  1); 
pvm_iq>Idnt(&dosize3,  1,  1); 
pvm_iq>kint(&dosize4,  1,  1); 
pvm_vpkint(&dosizeS,  1,  1); 

for  (  i=0;  i<npioc;  i+  +  ) 
if  (mytid  «  =  tids[i]  )  {  me  =  i;  break;} 

pvm_iecv(master,  me+1); 
pvm_i^kdouble(data_mat,  disizel '*comm_gain,  1); 
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while  (done  »  «  0) 

{ 

pvm_iiuteeiid(PvinT)ateT)efiuilt ); 
pvinjplGdouble(date_iiiat,  dosi2ie5*ConuD_2ain,  1); 
pv]n_8eod(tids(3], 


pvm_iiut8ead(PvinDataDehnilt ); 
pvm_pkdouble(daU_niet,  dosi2ie4*coiiim,^gam,  1): 
pvin_seod(tidsP],  3); 

pvm_init8ead(PvinDataI>efiuilt ); 
pviii_|dcdouble(data_inet,  dosize3*coaimjgain,  1); 
pvin_8ead(tid^l],  2); 

!**  slave  execution  core  **! 

for  Oc  0;  k  <  my_wt;  k+  +)  /*  simulates  node  A  *! 

for(i  =  0;i  <  1360;  i++) 

{ 

mum  =  randO: 

mumtot  =  mumtot  +  mum; 

data_mat[i-t-l]  —  i  +  1; 

} 

for  (k  »  0;  k  <  my_wt;  k+  +)  /*  simulates  node  B  ♦/ 

for(i  —  0;  i  <  1360;  i++) 

{ 

mum  -  raadQ; 

mumtot  =  mumtot  +  mum; 

data_mat[i+l]  =  i  +  1; 

/*  simulates  node  C  */ 

for  (k  *  0;  k  <  my_wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

mum  randQ; 

mumtot  a  mumtot  +  mum; 

data_mat[i+l]  =  i  +  1; 

} 

for  (k  =  0;  k  <  my_wt;  k+  +)  /*  simulates  node  D  */ 

for  (i  =  0;  i  <  1360;  i+ +) 

{ 

mum  s  randQ; 

mumtot  mumtot  +  mum; 

data_mat[i+l]  =  i  +  1; 

} 

pvm_recv(tids[2],  me+1); 

pvm_uidcdoubie(data_mat,  disize4*comm_jain,  1); 
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pviD_i«cv(tid8P],  me-t-l); 
pvmj9lalouble(date_iiiat.  disuBS*oomm j^.  1); 


pvin_iecv(inuter,  me-i-l); 

pvm_i^doiiUe(data_ii»t,  disuel^commjpun,  1); 

if  (data  iiiat(0]  <  0) 

{ 

done  »  1; 

pvm_iiutaend(Pvmnatar)efimIt ); 
pvm_pkdouUe(data_iiiat,  doaueS^mm jgain,  1); 
pvm  sendClidsIl],  2); 

} 

}  I*  end  of  wdiile  done  »  »  0  loop  */ 

I*  Tnfinrm  the  masto'  I  have  terminated  *! 
pvm_iiiitsend(PviiiDataDefaiilt ); 
pvm_pkint(&iiie,  1,  1); 
pvm_sead(master,  35); 

I*  Program  fiiiished.  Exit  PVM  before  stopping  *l 
pvm_exit0: 

}  /*  End  of  Slave  program  p2.c  */ 
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Slave  iffognun  for  scheduled  processor  3 


Ifiiiclude  "pvmS.h* 
fbclude  <stdio.h> 
#iiiclude  <sys/ti0]e.h> 
#iiiclude  <liiiie.h> 
jMoclude  <sys/types.h> 
fiiiclude  <sigiis].h> 
#iiiclude  <stdlib.h> 


/*  CONSTANTS  •/ 

#defiiie  poreiitl  3 
#defiiie  d(»e_tsg  45 

mainO 

{ 

int  i,k,mytid,  masten 

int  tids(20],  stid8[20],  my_wt,  coiiim_gain; 

int  nproc,  msgtype,  me,  snproc; 

int  mum,  mumtot^O,  done>BO: 

double  data_mat[ 10000]; 

double  gojkow  »  SS.SS; 

int  disizel,  disize2,  disi2D4,  dosizel,  dosizeS; 

/*  enroll  in  pvm  */ 

m3rtid  *  pvm_mytidO; 
master  »  pvm_p8ieat0: 

pvm_iecv(master,  10 ); 
pvm_tq>kint(&nproc,  1,  1); 
pvm_iq>kint(tids,  nproc,  1); 
pvm_iq>kint(&my_wt,  1,  1); 
pvm_i9kint(&comm_gain,  1,  1); 

pvm_recv(ma8tmr,  20  ); 
pvm_u|ddnt(&disizel,  1,  1); 
pvmjiqddnt(&disize2,  1,  1); 
pvm_iq>ldnt(&disize4, 1,  1); 
pvm_q>kint(&dosizel,  1,  1); 
pvm_iqddnt(&dosizeS,  1,  1); 

for  ( i=0;  i<npioc;  i+  +  ) 
if  (mytid  =  «  tids[i] )  {  me  =  i;  break;} 

pvm_recv(tids[0],  2); 

pvm_upkdouble(data_mat,  disize2'*comm_gain,  1); 


udiile  (done  =  ss  0) 

{ 
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pvm^initw d(Pvmn«tar)BfiHJt ); 
pviB_pkdoiibie(<tat»_iiMt,  ilofusS'^coiiim jpun,  1): 
pvm_Mad(tids(3], 

pvm_imtsaid(PvinD«td)efiHilt ); 

Iiym_piakiublfl(<fata_iiut,  doa2lel'^xHnm jt^,  1); 
pvm_aeiid(iiiMter,  me+l); 

Iivin_iiutseiid(PviiiDataDefiuilt ); 
pvm_pkdouble(&so_iiow,  1, 1); 
pvm_8eod(tids{2],  20); 

/**  slave  execudoo  coie  **/ 

for  (k  »  0;  k  <  my_wt;  k+  +)  /*  simulates  node  A  */ 

for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  »  nndO; 

mumtot  mumtot  4-  mum; 

data_mat[i+l]  »  i  +  1; 

} 

for  Oc  ”  0;  k  <  myjwt;  k+  +)  f*  umulates  node  B  •/ 

ft»  (i  =  0;  i  <  13M;  i+  +) 

{ 

mum  =  nndO; 

mumtot  «  mumtot  +  mum; 

data  matfi-t-l]  t  +  1; 

} 

/*  simulates  node  C  */ 
for  (k  0;  k  <  my_wt*2;  k+  +) 
fonr  (i  *  0;  i  <  1360;  i+  +) 

{ 

mum  »  nmdO; 

mumtot  mumtot  +  mum; 

data  mat[i-t-l]  =  1  +  1; 

} 

pvm_iecv(master,  me+1); 
pvmjiqdcdouble(data_mat,  disizel''^inm_gain,  1); 

pvm_recv(tids[2],  me+l); 
pvm_iq>kdouble(data_mat,  disi2e4'^x>mm _gaia,  1); 

pvm_iecv(tids[0],  me+ 1); 
pvm_upkdouble(data_mat,  disizB2*comm_^am,  1); 

if  (data_mat[0]  <  0) 

{ 

dcme  =  1; 

go_iiow  =  -34.33; 

pvm_initseiid(PvmDataDefault); 


pvm_pIcdouble(&go_iK>w,  1,  1); 
pvm  8eiid(tids[2],  20); 

} 

}  /*  cod  of  done  « 0  loq)  ♦/ 

I*  Infonn  the  master  I  have  terminated  */ 
pvm_initaeiid(PvmDataDe&ult ); 
pvm_pkmt(&me,  1, 1); 
pvm_aeiid(master,  35); 

/*  Program  finished.  Exit  PVM  before  stopping 
pvm_exit0; 


}  /*  End  of  Slave  program  p3.c  */ 


Slave  program  ftw  achedutod  processor  4 


iKadude  ”pvm3.h* 

(Kindude  <stdio.li> 
iNoclude  <8y8/time.h> 
ginclude  <tuiiB.h> 
fiiiclade  <8ys/types.h> 
fmdiide  <sigiial.h> 

#iiiclude  <8tdlib.h> 

/*  CX)NSTANTS  ♦/ 

#defiiw  psKotl  2 
#defiiiepaieiit2  3 

mainO 

{ 

int  ijc,mytid,  master, 
int  tidsC^l,  myjvt,  oommjgain; 
int  nproc,  msgtype,  me,  snproc; 
int  mum,  mumtot^O,  donesO; 
double  data_mat[10000]; 
double  go_now; 

int  disiml,  disize2,  disizeS,  dosixel,  dosize2,  dosizeS; 

/*  enroll  in  pvm  */ 
mytid  »  pvm_mytidO; 
master  »  pvmjMurentO: 

pvm_iecv(master,  10 ); 
pvm_upkint(&nproc,  1,  1); 
pvm_iqddnt(tids,  nproc,  1); 
pvm_tqddnt(&my_wt,  1,  1); 
pvm_rqddnt(&conim .^gain,  1, 1); 

pvm_recv(master,  20  ); 
pvm_iqddnt(&disizel,  1,  1); 
pvm_iqddnt(&disize2, 1,  1); 
pvm_tq>ldnt(&disize5,  1,  1); 
pvm_iq>kmt(&dosizel,  1,  1); 
pvm_u{ddnt(&do6i2e2,  1, 1); 
pvmjiq>kint(&dosi2B3,  1,  1); 

for  ( i*0;  i<nproc:  i+  +  ) 
if  (mytid  »  »>  tids[i]  )  {  me  «  i;  break;} 

daU_mat[0]  -  456.3333; 

pvm_i«cv(tids[l],  20); 
pvm_iq>kdouble(&go_now,  1,  1); 


(done  ~  »  0) 

{ 

pvm  ini*T**“K^**'^^**^^*'*  )> 

pvin_pkdoiiblfl(datk_niat,  dosuel^coaun _gai&i  1); 
pvm_Miid(inaster,  iiie+ 1); 


pviB_iiiitseod(PviiiDataDefinilt ); 
pviii_pkdoiible(data__iii*t»  do8i2e2*comm jpun,  1); 
pvin_8eiid(tids(0],  i); 


pviii_iiiit8ead(PviiiDataD^iuilt ); 
pvmjplaiouble(data_jiiat,  docize3*coiiim  jgim,  1); 
pvm_seiid(tids[l].  2); 


pvm  ipi**«nd(Pvinn«t«DeiiMilt ); 

pvmj[dcdottble(&go_iiow,  1, 1); 
pvm_seiid(tids[3],  4); 

if  (data_0iat[O]  <  0) 
done  s  1; 


/*♦  slave  execution  core  **! 
for  ^  «  0;  k  <  my_wt;  k+  +) 
for  (i  »  0;  i  <  1360;  i+  +) 

{ 

mum  >=  randO: 

mumtot  s  mumtot  +  mum; 

dau  mat[i+l]  =  i  +  1; 

} 

for  (k  =  0;  k  <  my_wt;  k+  +) 
for  (i  a:  0;  i  <  1360;  i+  +) 

{ 

mum  =  randO; 

mumtot  s:  mumtot  +  mum; 

data  mat(i+l]  =  i  +  1; 

} 

for  (k  =  0;  k  <  my_wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  =  randO; 

mumtot  =  mumtot  +  mum; 

data  matti+1]  =  i  +  1; 

} 

for  (k  =  0;  k  <  my_wt;  k+  +) 
for  (i  *  0;  i  <  1360;  i+  +) 

{ 

mum  =  landQ; 

mumtot  s  mumtot  +  mum; 


/*  citwiilatm  node  A  */ 


I*  simulates  node  B  *1 


I*  simulates  node  C  */ 


/*  simulates  node  D  */ 
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data  inat(i4-l]  »  i  -f  1; 

) 

pvni_i«cv(tids(0],  hib+I); 
pvmjqikdouUe(data_iiiat,  disueZ^teomin _gain,  1); 

pviii_iecv(tida(3),  me+l); 
pvmjipkdouUe(data_iiiat,  disiaeS*teoiiim _jain,  1); 

pvin_iecv(iiiaster,  me+l); 
pvmjqikdoulde(data_iiiat,  dut2el'^»mm _g»m,  1): 

pvinjiecv(tid8(iiie-l],  20); 
pvm_iqda^ble(&go_iiow,  1, 1); 

if  (go  now  <  0) 

(kme  «  1; 

pvm_initsea<l(PvmDataDefaiilt ); 
pv9ij|dcdoid>l«(&gojBOw,  1, 1); 
pvm  8ead(tids[3],  4); 

} 

}  /*  end  of  vdiile  done  =  »  0  loop  */ 

/*  Inform  the  mastM'  I  have  terminated  */ 
pvm^initsendCPvmDataDefinilt ); 
pvm_pkint(&nie,  1,  1); 
pvin_seiid(master,  35); 

/*  Program  finished.  Exit  PVM  before  sU^ing  */ 
pvm_exit0; 

}  /*  End  of  Slave  program  p4.c  */ 
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,**mm0*m**m*m*m*m**mm*****m**m**m* 
Slave  progfam  for  scheduled  prooessw  5 


finclude  "pvmS.h” 
finclude  <stdio.h> 
finchide  <sys/tiiiie.h> 
fiiiclude  <ti]iiB.h> 
fiiiclude  <8y8/type8.h> 
#iodude  <8igiial.h> 
iNiiclude  <stdlib.h> 


/*  CONSTANTS  ♦/ 
idefine  pareotl  2 
ideiiiie  parent!  3 

mainO 

{ 

int  i,k,]]iytid,  master; 

int  tid8[20],  my_wt,  comm jgain; 

int  nproc,  msgQrpe,  me; 

int  mum,  mumtotaiO,  donesO; 

double  data_mat[10000],  go_now: 

FILE  ^ip; 

int  oldintv  «■  0,  newintv  »  0,  tcalc; 
struct  timeval  stime; 


J*  enrol]  in  pvm  */ 
mytid  »  pvm_mytidO; 
master  »  pvmjMrentQ; 


pvm_recv(mastCT,  10  ); 
pvm_iqddnt(&npioc,  1,  1); 
pvm_iqddnt(tids,  nproc,  1); 
pvmjiq)kint(&my_wt,  1, 1); 
pvmjqddnt(&comm _2ain,  1, 1); 


pvm_recv(mastM^,  20  ); 
pvm_u{ddnt(&di8i2el,  1,  1); 
pvm_tq)kint(&disize2,  1,  1); 
pvm_upkint(&disi2e3,  1,  1); 
pvm_)q>kint(&dosize2,  1,  1); 
pvm_iq)kint(&dosize4,  1,  1); 

for  ( i=0;  i<nproc;  i+  +  ) 
if  (mytid  =  «  tids[i] )  {  me  «  i;  break;} 

ofp  «  fopen(”/Iiome3/stooe/Thesis/matlab_files/Sched.out‘',''w''); 
ipiintf(olp,  "Nn  %d''  .comm^ain); 

^printf(ofJ>,  "\n  %d*  ,my_wt/4); 

dau_mat(0]  ==  456.33333; 
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pVOI_f«GV(tid8(2],  BIB-l-l); 

pvm_i9lRlouUe(Jkfo_iiow,  1, 1); 

\i4ile  (done  »  0) 

{ 

pvm_iiutsead(PviiiDaUl>e£uilt ); 
pvm_|dcdouble(data_ont,  dosize2^inm jpun,  1): 
pvm_seod(tids[0],  1); 

pvm_iiittsead(PviiiI>alaDe£nilt ); 

Iiviii_|dcdoubIe((lata_imt,  d(Miz64'*comm_saia,  1); 
pvm_8eiid(tid8[2],  3); 

pvm_initsead(PvmD«t«D«fiaiit ); 
pvm_pkdoiible(&go_iiow,  1,  1); 
pvm_seiid(iiMster,  me+l); 

/<•>*  slave  execudcm  cofe  **! 

for  (k  »  0;  k  <  iny_wt;  k+  +)  /*  simulates  node  A  */ 

for  (i  ~  0;  i  <  1360;  i+  +) 

< 

mum  s  randO; 

momtot  «  mnmtot  -f-  mum; 

data  mat[i+l]  *1+1; 

} 

for  (k  *  0;  k  <  myjvt;  k+  +)  f*  simulates  node  B  */ 

for  (i  *  0;  i  <  1360;  i+  +) 

{ 

mum  *  randO; 

mumlot  *  mumtot  +  mum; 

data  mat[i+l]  *  i  +  1; 

} 

for  (k  *  0;  k  <  my_wt*2;  k+  +)  /♦  simulates  node  C  */ 

ftw  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  *  randO; 

mumtot  =  mumtot  +  mum; 

data  inat[i+l]  *  i  +  1; 

} 

gettimeofday(&stime,  (struct  timeval*)0); 
newintv*stiine.tv_sec'''1000000+stime.tv_usec; 
tcalc  *  newintv-oldintv; 

4>rintf(o^,''\n%d'',tcalc); 
oldintv  *  newintv; 

pvm_iecv(tids[0],  me+l); 
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pvm_iq)kdoiible(data_OHU,  disizeZ^coniin _sain,  1); 

pvm_ncv(tid8(l],  me+l); 
pvni_tq|dcdoiible(data_0uU,  disizie3*comm,jam,  1); 

pvm_i«cv(iiiastar,  me+l); 

pvm_iq;daloiiUe(djita_iiiat,  disi2el*comm,^gam,  1); 

pvm_fecv(tids(2],  oie+1): 
pvm_v|ricdouble(&go_iiow,  1, 1); 


if  (gojBow  <  0) 
done  »  1; 

}  /*  end  of  wdiile  done  »  =»  0  loc^  */ 

fclo6e(o^); 

/*  Infoim  the  master  I  have  terminated  *! 
pvm_initseod(PvmDataDefault ); 
pvm_pidnt(&me,  1,  1); 
pvm_seod(mastM',  35); 

I*  Program  finished.  Exit  PVM  before  stopping  */ 
pvm_aut0; 

}  /*  End  of  Slave  program  pS.c  */ 

/*  done  =  pvm_probe(masttt,  done_tag);*/ 
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APPENDIX  F  •  HABDWARE  MULTICAST  NODE  PROCESSING  PROGRAMS 


MASTER  PROGRAM  (PROCESSOR  1): 


Scbeduled  master  program  using  hardware  implemented  message  multicasts 


^include  ’’pvmS.h* 
^include  <stdio.h> 
jNndude  <^8/time.h> 
^include  <time.h> 
#include  <qrs/types.h> 
#include  <stdlib.h> 


/■*  CONSTANTS  */ 
#define  doiie_lp  1000 
Mefinesnl400 


/*  GRAPH  TIME  VARIABLES  •/ 
int  plout  =  300  +  300  +  300  +  300+300; 
int  p2out  -  350+  350  +  350+  350  +  300; 
int  p3ottt  o  525  +  300+525; 
int  p4out  >  350  +  400+350  +  350; 
int  p5out  3=  300  +  900; 
int  ld_num  »  55000; 

/♦  GLOBAL  VARIABLES  */ 
int  nproc  »  4; 

int  mytid;  /*  my  task  id  *! 

int  tids[20];  /*  slave  task  ids 

int  v/bo,  done  =  0; 
d<Hible  data_mat[11000],  go_now ; 

mainO 

{ 

int  i,  k,raum,  mumtot; 

int  snproc  —  3,  stids[5],  wnl  »  0,  nl_d<me  »  0; 
struct  itimerval  tmrval; 
int  comm_gain,  my_wt; 

printf(*\nComm  wt  =*  "); 
scanfC^d",  Acomm .^gain); 
piint^'NEx  wt  =  "); 
scanfC^d”,  &my_wt); 
my_wt »  my_wt*4; 

printf('\With  NET  loading  type  1,  without  NET  loading  type  2:  ”); 
scanfC^d',  Awnl); 

/*  initialize  matices  *1 
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f6f(k«0;k<1500;k++) 

data_iii«t{k]»(double)k+5.66666; 


/♦  eoioU  in  pvm  */ 
mytid  »  pvm_m]rtidO: 

/*  slut  up  slnve  taAn  */ 

pvm_q>avm(''p2h*,  NULL,  1,  "sunS",  1,  &tids[0]); 
pvni_^wn(''p3h‘',  NULL,  I,  "suaS’,  1,  &tids[l]); 
pvm_qMwn(*p4b'',  NULL,  1,  *8un9*,  1,  &tids[2]); 
pvm_«pawn(*p5h-.  NULL,  1.  'sunM*.  1.  &tids(3J); 

pvm_mitsea<^PvniDatiDefiuilt); 
pvmj[dciDt(&nptoc,  1, 1); 
pvm_pkint(dds,  nproc,  1); 
pvni_iddnt(&my_wt,  1, 1); 
pvm_pkmt(&conim_gain,  1,  1); 
pvm_mcsst(tids,  npioc,  10); 

pvm_initsend(PvmDataDefuiU ); 
pvm_iddnt(j^lout,  1,  1); 
pvm_iddnt(&p2out,  1,  1); 
pvm_8eQd(tids[0],  20); 

pvm_initsend(PvmDattDefiuilt ); 
pvm_|ddnt(&p2oiit,  1,  1); 
pvmj;ddnt(d^3out,  1,  1); 
pvm_send(tids[l],  20); 

pvin_initsend(PvmDaUDefault ); 
pvm_idant(&p3out,  1,  1); 
pvm_iddnt(&p4out,  1,  1); 
pvin_sead(tids(2],  20); 

pvni_initeend(PvmD«taDcfiuilt ); 
pvm_pkint(di^>4out,  1,  1); 
pvm_{ddnt(&pSout,  1, 1); 
pvm_8eod(tid43],  20); 


if  (  wdI  =  *  1) 

{ 

pvni_sp8wn(*sl*,  NULL,  1,  *sun3*,  1,  &stids[0]); 
pvm_qMtwn(''82*,  NULL,  1,  *suii20*,  1,  &stids[l]); 
pvin_qwwn(*s3',  NULL,  1,  *sun8*,  1,  &stids(2]); 

pvm_mit8end(PvmDataDeiiiuIt); 
pvm_iddnt(&snproc,  1,  1); 
pvmjddntfstids,  snproc,  1); 
pvm_iDcast(8tids,  snproc,  62); 

} 
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/*  B«gm  User  Program  */ 


for  (done  »  0;  done  <  done  done  +  -»-) 

{ 

if  (done  »  -  snl  &&  wnl  >■  -  1) 

{ 

pvm_iiutMad(PvinDatiDe£nilO: 
pvm_pkiiit(ddd_nuiii,  1, 1); 
pvm  8eoiKstids[0],  22); 

} 

if  (done  ~  ■>  done  Ip  •  1) 
dats_iDat[0]  «  -444.SSS; 

pvm_init»ead(PvmDatoT)efimlt ); 
pvm_plcdouble(data_aut,  plout'^iam .^gain,  1); 
pvm_send(tids[0],  12); 

/*  Slave  execution  cores  for  PIA,  PIB,  and  PIC 
for  (k  »  0;  k  <  my_wt'*2;  k+  +) 
ftw  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  =  randO; 

mumtot  3e  mumtot  +  mum; 

data  inat(i+l]  =  i  +  1; 

} 

for  (k  aE  0;  k  <  my_'wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  =  randQ; 

mumtot  a  mumtot  +  mum; 

data_mat[i+l]  =  i  +  1; 

} 

for  (k  »  0;  k  <  my_wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  aa  randQ; 

mumtot  s  mumtot  +  mum; 

data  mat[i+l]  i  +  1; 

} 

printf(’'\nOn  loq;>  number  %d\n*,done); 

if(data  mat[0]  >  =  0) 

{ 

pvm_fecv(tids[3],  51); 

pvm_iqdcdoubIe(data_mat,  pSout*comm .jain,  1); 


} 

}  /*  end  of  for  loop  */ 

priiitf(*\nThe  loop  is  dooeVn''); 

/*  Ensure  sU  slaves  have  quit  pric»’  to  termination 
for(i^O;  i<iqHoc;  i++) 

{ 

pvm_iecv(-l,  35); 
pvm  iqpkint(&w^,  1,  1); 

} 

printf(*\nPiogram  ni2h.c  cwt  »  %A,  ewt  »  Xd  is  done\n*,  comm _gain,  my_wt/4); 

I*  Program  Finished  exit  PVM  before  stewing  */ 
pvm_exitO; 

}  !**  END  OF  MAIN  PROGRAM  **/ 
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THE  INDIVIDUAL  SLAVE  PROGRAMS: 

/*mti***********************i***************‘*****************‘ 

Slave  program  for  scheduled  w/  hardware  multica^  for  Processor  2 

#iiidiide  "pvmS.h* 
iMnchide  <stdio.h> 
fmdude  <sys/tiine.h> 

^include  <ti0ie.h> 

#ioclude  <sys/types.h> 

#iiicltide  <stdllib.h> 

mainO 

{ 

iot  tids(20],  my_wt,  comm .jgain; 
int  i^MOC,  msgtype,  me,  i,k,mytid,  master, 
int  mum,  mumtot^O,  dooesO; 
double  data_mat[l  1000]; 
int  ploot,  p2out,  p3out,  p4out,  pSout; 

/*  enroU  in  pvm  */ 
mydd  =  pvm_mytidO; 
master  »  pvm_paientO; 

pvm_tecv(master,  10 ); 
pvm_u|ddnt(&nproc,  1,  1); 
pvm_tq>kint(tids,  nproc,  1); 
pvm_upkint(dhny_wt,  1,  1); 
pvm_upkint(&comm _gain,  1,  1); 

pvm_iecv(niaster,  20 ); 
pvm_upkint(A^lout,  1,  1); 
pvm_upkint(d^2oat,  1,  1); 
for  ( 1=0;  i<nproc;  i+  +  ) 
if  (mydd  «  =  tids[i] )  {  me  »  i;  break;} 
pvm_tecv(master,  12); 

pvm_upkdouble(data_mat,  plout*tomm _jain,  1); 


while  (done  «  =  0) 

{ 

pvm_initsend(PvmDataDefnilt ); 
pvm_pkdouble(data_mat,  p2out*comm _gain,  1); 
pvm_seod(dds(l],  23); 

/**  slave  execudon  core  ♦*/ 

for  (k  s  0;  k  <  my_wt;  k+  +)  f*  simulates  node  A  */ 

for  (i  =>  0;  i  <  1360;  i+  +) 

{ 

mum  randQ; 

mumtot  =  mumtot  -f  mum; 

data_mat(i+l]  »  i  -f-  1; 
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for  (k  »  0;  k  <  my_wt;  k+  +) 
fcv  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  »  nmdO; 
mumtot  »  muntfot  4-  mum; 
data_mat(i4-l]  »  i  +  1; 

} 

for  (k  »  0;  k  <  my_wt;  k+  +) 
for  (i  »  0;  i  <  1360;  i+  +) 

{ 

mum  randO; 
mumtot  «  mumtot  4-  mum; 
daU_mat[i4-l]  =  i  4-  1; 

} 

for  (k  «  0;  k  <  my_wt;  k4-  4-) 
for  (i  =  0;  i  <  1360;  i+  4-) 

{ 

mom  —  randO; 
mumtot  =  mumtot  4-  mum; 
data_mat[i4-l]  =  i  4-  1; 

} 

pvm_iecv(master,  12); 
pvm_upkdouble(data_mat,  plouf*tenun_gain,  1); 
if  (data  mat[0]  <  0) 

{ 

done  =  1; 

pvm_iiiitsend(PvmDataT>efi>ult ); 
pvm_|d[double(data_oiat,  p2out*comm,^gam,  1); 
pvm  send(tid8(l],  23); 

} 

}  /*  cod  of  udiile  done  =  =  0  lo<^  */ 

/*  Inform  the  master  I  have  terminated  */ 
pvm_initsead(PvmDataDefoult ); 
pvmj;>kint(&me,  1,  1); 
pvm_seod(niaster,  35); 

/*  Program  finished.  Exit  PVM  before  stopping  */ 
pvm_exit0; 

}  /*  End  of  Slave  program  p2.c  *f 


/*  simulates  node  B  */ 


I*  simulates  node  C  *! 


/*  node  D  *! 


Ill 


Slava  program  for  achedulad  hardwue  multicast  for  Processor  3 

»»»»»»»»*»»*»»*»**»»»♦»*♦♦*»**»»*»*»»**»»»**»***** ********************* / 

#iiiclude  *pvm3.h* 

Uliaclude  <8tdio.h> 
iKmctude  <8y8/tiiiie.h> 

#iodude  <tiine.li> 

#iiiclude  <qrs/types.h> 
ilfiiicliide  <sigiial.h> 

#iiiclude  <stdlib.h> 

matnO 

{ 

int  i,k,mytid,  master,  nproc,  msgtype,  me; 
int  tids[20],  stids(20],  myjvt,  comm .jain; 
int  mum,  mumtotasO,  dooesO; 
double  data_mat[11000],  go_now  =  SS.55; 
int  plout,  p2out,  p3out,  p4out,  pSout,  snproc; 

I*  enroll  in  pvm  *l 
mytid  —  pvm_mytidO; 
master  »  pvm_pareotO: 


pvm_tecv(master,  10  ); 
pvm_iq)kint(&npioc,  1,  1); 
pvmjiq)kint(tids,  nproc,  1); 
pvm_tqikint(&my_wt,  1,  1); 
pvm_iq>kint(&comm _gain,  1,  1); 

pvm_iecv(master,  20  ); 
pvm_upidnt(&p2out,  1,  1); 
pvm_iqikint(d^3oot,  1,  1); 

for  ( i=0;  i<iqnoc;  i+  +  ) 
if  (BE^d  >■  B  tids[i] )  {  me  =  i;  break;} 

pvm_recv(tids[0],  23); 

pvm_upkdouble(data_mat,  p2out*comm _gain,  1); 


while  (dcme  =  =  0) 

{ 


pvm_initseod(PvmDataDeiault ); 
pvmjpkdoubIe(data_mat,  p3out*comm jgain,  1); 
pvm_8end(tids[2],  34); 

/**  slave  execution  core  **/ 

for  (k  a:  0;  k  <  my_wt;  k+  +)  /*  simulates  node  A  */ 

for  (i  *  0;  i  <  13M;  i+  +) 

{ 

mum  »  randO; 
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/*  ginwilatwi  node  B 


miuntot  >■  mumtot  +  mum; 
date  iiiat(i+l]  i  +  1; 

} 

for  (k  =  0;  k  <  iny_wt;  k+  +) 
fo.-  (i  »  0;  i  <  13W;  i+  +) 

{ 

mum  =  nmdO: 
mumtot  mumtot  +  mum; 
date_mat(i+ll  =  i  +  1; 

} 

for  (k  “  0;  k  <  my_wt*2;  k+  +)  /*  simulates  node  C  */ 

for  (i  »  0;  i  <  1360;  i+  +) 

{ 

mum  <=  nmdO; 

mumtot  «  mumtot  +  mum; 

data  mat[i  +  l]  «  i  +  1; 

} 

pvm_Rcv(tids[0],  23); 

pvm_tqdcdouble(data_mat,  p2out*comm_gain,  1); 

if  (data_mat[0]  <  0) 

{ 

done  »  1; 

pvm_initseiKl(PvmDataDe£nilt); 
pvm_idcdouble(data_mat,  p3out^'commjgain,  1); 
pvm  send(tids[21,  34); 

> 

}  /♦  end  of  while  done  =  =  0  loop  */ 

I*  Infonn  die  mastM'  I  have  terminated  *! 
pvm_initsend(Pvn]DataDefault ); 
pvmjddnt(&me,  1,  1); 
pvm_send(master,  35); 

I*  Program  finished.  Exit  PVM  before  sttqiping  *! 
pvm_exit0; 

}  /*  End  of  Slave  program  p3h.c  */ 
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SUve  ptogiam  fi)r  »cheduled  hardware  multicast  for  ProceMor  4 

imdude  *pviii3.h* 
finchide  <itdio.h> 
l^aclude  <qrsAiine.h> 
finclude  <tiiDe.h> 
fmchide  <qr8/type8.h> 
iKnclude  <sigiial.h> 
imclude  <stdlib.h> 

mainO 

{ 

int  i,k,mytid,  master,  iqmic.  msgtype,  me; 
int  iids[20],  my_wt,  oommjgain; 
int  mum.  nnimtota>0,  doiie»0: 
double  data_mat(100001,  go_iiow; 
int  p3out,  p4out: 

/*  enroll  in  pvm  */ 
mydd  «  pvm_mytidO: 
master  pvmjpaientO: 

pvm_fecv(master,  10  ); 
pvmjiqddnt(&npioc,  1, 1); 
pvmjfidntCtids,  nproc,  1); 
pvmju{ddnt(daiiy_wt,  1,  1); 
pvm_iq)]dnt(&comm_gain,  1,  1); 

pvm_iecv(master,  20  ); 
pvm_t9ldnt(d^3out,  1,  1); 
pvm_i4ddnt(d^>4out,  1,  1); 

for  ( i*=0;  i<nproc;  i+  +  ) 
if  (mydd  •=  =  dds(i] )  {  me  =  i;  break;} 

data_mat[0]  »  456.3333; 

pvm_iecv(dds[l],  34); 

pvm_iqdc^mble(data_mat,  p3out'^x>mm _gain,  1); 


vtiiile  (done  =  »  0) 

{ 

pvm_initsend(PvmDataDe£nilt ); 
pvmjdcdouble(data_mat,  p4out*comm_gain,  1); 
pvm_send(dds[3],  45); 

if  (d8ta_mat[0]  <  0) 
done  1; 

/**  slave  execudon  core  **l 


114 


I*  simnlatftfi  node  A  *! 


for  Oc  -  0;  k  <  iny_wt;  k+  +) 
for  (i  —  0;  i  <  1360;  i+  +) 

{ 

nnim  »  randO: 

munatot  ■■  mumtot  +  mum; 

data_mat[i+l]  *>1  +  1; 

} 

for  (k  >=  0;  k  <  my_wt;  k+  +) 
for  (i  »  0;  i  <  1360;  i+  +) 

{ 

mum  randO: 

mumtot  »  mumtot  +  mum; 

data  mat[i4-l]  —  i  +  1; 

} 

for  (k  »  0;  k  <  my_wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i++) 

{ 

mum  s  randO; 

mumtot  =  mumtot  +  mum; 

data  mat[i+l]  =  i  +  1; 

} 

for  (k  *  0;  k  <  my_wt;  k+  +) 
for  (i  =  0;  i  <  1360;  i+  +) 

{ 

mum  =  randO; 

mumtot  SB  mumtot  +  mum; 

data  matCi-fl]  =  i  +  1; 

} 


I*  simulates  node  B  *! 


I*  simulates  node  C  */ 


/■*■  siimilateg  node  D  */ 


pvm_rocv(tids[l],  34); 

pvm_iq>kdouble(data_mat,  p3out*0omm _gain,  1); 

if  (data_mat(0]  <  0) 

{ 

done  =  1; 

pvm_initsend(PvmDataDefault ); 
pvmjdEdouble(date_mat,  p4out*comm .^gain,  1); 
pvm  send(tids[3],  45); 

} 

}  /*  end  of  udiile  <kme  =  =  0  loop  */ 

/*  Inform  the  master  I  have  terminated  */ 
pvm_initsend(PvmDataDefiniIt ); 
pvm_pkint(&me,  1,  1); 
pvm_sead(master,  35); 


/*  Program  finished.  Exit  PVM  before  stopping  ♦/ 
pvm_exit0; 

}  /*  End  of  Slave  program  p4h.c  */ 
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■/ 


/ii»  ttm  I  »»», >»»«***«>***»♦**»  .1 

Slave  pfognun  for  acheduled  haidware  multicut  for  Proceasor  S 

fiadude  "pvmS.li” 
findude  <stdio.h> 
fmdude  <sys/tiiiie.h> 

#iadiide  <tiiii6.h> 

#iachide  <ty8/types.h> 

#iiicliide  <stdlib.li> 

mainO 

{ 

int  i,k,mytid,  master,  nproc,  msgtype,  me; 
int  tids(20],  my_wt,  comm _gaiii; 
int  mum,  mumtotaBO,  doiie»0; 
double  data_inat[ 10000],  go_iiow; 
int  oldintv  »  0,  newintv  »  0,  tcalc,  p4out,  pSout; 
FILE*ofp; 
struct  timeval  stime; 


/*  enroll  in  pvm  */ 
mytid  *  pvm_mytid0; 
master  *■  pvm^pareotO; 

pvm_recv(master,  10 ); 
pvm_upkint(&n{mK,  1,  1); 
pvm_iqddnt(tids,  nproc,  1); 
pvm_tqddnt(&my_wt,  1,  1); 
pvm_upldnt(&comm_gain,  1,  1); 

pvm_recv(master,  20  ); 
pvm_iqd!int(d^>4out,  1,  1); 
pvm_iqi)dint(&pSout,  1,  1); 

for  ( i=0;  i<nproc;  i+  +  ) 
if  (mjrtid  »  =  tids[i]  )  {  me  =  i;  break;} 
o^  as  fopmC/homeS/stone/Thesis/matlab  files/Sched  hs.out”,’'w''); 
data_niat[0]  =  456.33333; 

pvm_rBcv(tids[2],  45); 

pvmjqdcdouble(data_mat,  p4out*comm_gain,  1); 

wtiile  (ckme  0) 

{ 

pvm_init8end(PvmDataI>eiault ); 
pvm_idrdoubIe(data_mat,  pSouf^comm _gain,  1); 
pvmjseod(master,  SI); 

/**  slave  execution  core  **/ 

for  Oc  *  0;  k  <  my_wt;  k+  +)  /♦  simulates  node  A  */ 

for(i  =  0;  i  <  1360;  i++) 
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I*  simulate  node  B  */ 


{ 

mum  o  nmdO: 
mumtot  —  mumtot  +  mum; 
data  mat{i+l]  »  i  +  1; 

) 

fiw  (k  »  0;  k  <  my_wt;  k+  +) 
for  (i  ■»  0;  i  <  1360;  i+  +) 

{ 

mum  nmdO: 
mumtot  «  mumtot  +  mum; 
data_mat(i+l]  *  i  +  1; 

for  (k  -  0;  k  <  my_wt*2;  k+  +)  /*  simulates  node  C  *! 

for(i  -  0;i  <  1360;  i++) 

{ 

mum  »  landO; 

mumtot  «  mumtot  +  mum; 

data_mat[i+l]  •*  i  +  1; 

} 

gettimeofday(&stime,  (struct  timeval'*)0); 
newintv  Tstime.tv_8ec*1000000+stime.tv_usec; 

♦ralf  =  newintv-oldintv; 

^prmtfCofp,  ’\n%d"  ,tcalc); 
oldintv  newintv; 

pvm_recv(tids[2],  45); 

pvm_i9ikdouble(data_mat,  p4ouf*comm_,gain,  1); 
if  (data_mat[0]  <  0) 
done  *  1; 

}  /*  end  of  while  done  =  =  0  loop  */ 

fclo8e(ofp); 

/*  Inform  die  mastm'  I  have  terminated 
pvm_initseod(PvmDataDe&ult }; 
pvm_j4dnt(&me,  1,  1); 
pvm_seiid(master,  35); 

I*  T>mgram  finished.  Exit  PVM  before  stopping  *! 
pvm_exit0; 

}  /*  ^d  of  Slave  program  p5.c  */ 
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